Thread 106840513

15 posts 6 images /g/

Anonymous 10/9/2025, 9:05:57 PM No.106840513 [Report] >>106840659 >>106840694 >>106840916 >>106842204 >>106842309 >>106844416 >>106844931

New ARC-AGI SoTA

Screenshot_20251009-200303~2.jpg md5: fb9dd794...

GPT-5 Pro takes the lead

Anonymous 10/9/2025, 9:08:21 PM No.106840537 [Report]

Yeah Claude had them beat for a bit, but they'll probably extend their lead in December again with the research preview

Anonymous 10/9/2025, 9:21:33 PM No.106840659 [Report] >>106840969

>>106840513 (OP)
What is the point of this shit? OpenAI will just take the problems and add them to the training set now that they got them. You're retarded if you don't think that kike will have logged every single query being run for this """semi-private""" benchmark.

Anonymous 10/9/2025, 9:26:18 PM No.106840694 [Report]

>>106840513 (OP)
These benchmarks mean fucking nothing, it's just investors, billionaires, and the US empire trying to stave off the AI bubble collapse by pretending it's going somewhere, because when it does finally pop it's likely to take the US's status as world hegemon with it. Rather, it will expose the fact that the US's status as world hegemon has been sustained by belief for the last decade or so. Claude, the best coding AI according to these benchmarks, can't even handle programming other than webshit (it likely can't handle webshit jeetcode either but webshitter jeets don't know any better). I've had it fuck up matrix multiplication on several occasions.

AI is not going anywhere, but these giant multibillion (or trillion) LLMs with 100k datacenter GPUs for training are going to crash and burn horribly. What's going to survive are Chinese AIs like DeepSeek which are cheap as fuck, run on a (relative) toaster, and are free to use locally.

Anonymous 10/9/2025, 9:48:56 PM No.106840875 [Report]

9657030a816e667de97cf00161522f7c.png md5: cf416b71...

>"""private""" benchmark

Anonymous 10/9/2025, 9:52:59 PM No.106840916 [Report]

>>106840513 (OP)

llama score low

Anonymous 10/9/2025, 9:58:23 PM No.106840969 [Report]

>>106840659
the point of this one is to land in the green zone, not just increasing the score

Anonymous 10/10/2025, 12:18:08 AM No.106842204 [Report]

1747016859371829.jpg md5: 551dde5c...

>>106840513 (OP)
>The ultimate final AGI benchmark
>The ultimate final AGI benchmark 2
>The ultimate final AGI ...

Anonymous 10/10/2025, 12:32:41 AM No.106842309 [Report]

>>106840513 (OP)
>Hides Grok Heavy
>but shows GPT-5 Pro
Did OpenAI threaten them with ban on access if they list Grok Heavy which mogs the entire benchmark?

Anonymous 10/10/2025, 6:14:38 AM No.106844416 [Report]

>>106840513 (OP)
>Trusting benchmarks in the current year

Anonymous 10/10/2025, 6:17:26 AM No.106844433 [Report]

>Semi-private
surely no one is buying this retarded shit...?

Anonymous 10/10/2025, 6:21:11 AM No.106844457 [Report] >>106844896

Who is Pang and Berman

Anonymous 10/10/2025, 7:53:59 AM No.106844896 [Report]

>>106844457

iirc pang has like 2500 whitepaper tier sci

Anonymous 10/10/2025, 8:01:10 AM No.106844931 [Report]

>>106840513 (OP)
who is pang and how is he mogging

Anonymous 10/10/2025, 8:16:27 AM No.106845001 [Report]

Bro thinks his heckin' text predictor is going to become sentient