reminder that o3 was already stomping other reasoning models. it's not even close, despite what the meme benchmarks say.