redditor ran his benchmark
>Interestingly, the non-reasoning version scored above the reasoning version. Nowhere near the frontier, but a 13% jump compared to DeepSeek-R1-0528’s score.