← Home ← Back to /g/

Thread 106840513

15 posts 6 images /g/
Anonymous No.106840513 [Report] >>106840659 >>106840694 >>106840916 >>106842204 >>106842309 >>106844416 >>106844931
New ARC-AGI SoTA
GPT-5 Pro takes the lead
Anonymous No.106840537 [Report]
Yeah Claude had them beat for a bit, but they'll probably extend their lead in December again with the research preview
Anonymous No.106840659 [Report] >>106840969
>>106840513 (OP)
What is the point of this shit? OpenAI will just take the problems and add them to the training set now that they got them. You're retarded if you don't think that kike will have logged every single query being run for this """semi-private""" benchmark.
Anonymous No.106840694 [Report]
>>106840513 (OP)
These benchmarks mean fucking nothing, it's just investors, billionaires, and the US empire trying to stave off the AI bubble collapse by pretending it's going somewhere, because when it does finally pop it's likely to take the US's status as world hegemon with it. Rather, it will expose the fact that the US's status as world hegemon has been sustained by belief for the last decade or so. Claude, the best coding AI according to these benchmarks, can't even handle programming other than webshit (it likely can't handle webshit jeetcode either but webshitter jeets don't know any better). I've had it fuck up matrix multiplication on several occasions.

AI is not going anywhere, but these giant multibillion (or trillion) LLMs with 100k datacenter GPUs for training are going to crash and burn horribly. What's going to survive are Chinese AIs like DeepSeek which are cheap as fuck, run on a (relative) toaster, and are free to use locally.
Anonymous No.106840875 [Report]
>"""private""" benchmark
Anonymous No.106840916 [Report]
>>106840513 (OP)

llama score low
Anonymous No.106840969 [Report]
>>106840659
the point of this one is to land in the green zone, not just increasing the score
Anonymous No.106842204 [Report]
>>106840513 (OP)
>The ultimate final AGI benchmark
>The ultimate final AGI benchmark 2
>The ultimate final AGI ...
Anonymous No.106842309 [Report]
>>106840513 (OP)
>Hides Grok Heavy
>but shows GPT-5 Pro
Did OpenAI threaten them with ban on access if they list Grok Heavy which mogs the entire benchmark?
Anonymous No.106844416 [Report]
>>106840513 (OP)
>Trusting benchmarks in the current year
Anonymous No.106844433 [Report]
>Semi-private
surely no one is buying this retarded shit...?
Anonymous No.106844457 [Report] >>106844896
Who is Pang and Berman
Anonymous No.106844896 [Report]
>>106844457

iirc pang has like 2500 whitepaper tier sci
Anonymous No.106844931 [Report]
>>106840513 (OP)
who is pang and how is he mogging
Anonymous No.106845001 [Report]
Bro thinks his heckin' text predictor is going to become sentient