Even the largest SOTA models at ~1T size can only achieve like 50% on SimpleQA and that’s a benchmark with cheatable open dataset.
Just how large the model needs to be to answer all of my obscure otaku subculture questions?