Report Content - 4rchive

Gpt-5/horizon alpha ignored the "Length: 1000 words." prompt and thus have inflated scores on eq-bench writing tasks. Most model outputs have ~1000 words/~7000 characters. Gpt-5 outputs have ~2000 words/~14000 characters

Report

Post Preview