>>106521718
My benchmark is: "does this look coherent?" At 80M tokens it's more often incoherent than not, although the underlying grammar looks like English. The text might look OK on a quick glance, but if you carefully read it you'll quickly spot nonsense. It's also very sloppy, but that's a different issue.
I guess a separate "benchmark" role could be added in a future run.