← Home ← Back to /g/

Thread 105804860

79 posts 14 images /g/
Anonymous No.105804860 >>105805004 >>105805010 >>105805039 >>105805064 >>105805070 >>105805433 >>105806069 >>105806340 >>105808672 >>105808917 >>105809318 >>105811376 >>105811490 >>105813128 >>105813731 >>105814160 >>105814503 >>105820678
These leaked Grok-4 benchmark results look fucking nuts.
Anonymous No.105804976
more like cock-4 lmao
Anonymous No.105805004
>>105804860 (OP)
>new benchmarks come out
>train on new benchmarks
>does better at new benchmarks
Anonymous No.105805010 >>105805028 >>105805057 >>105805266 >>105805466 >>105806027 >>105811465 >>105814886 >>105821932
>>105804860 (OP)
watching redditors melt down over le ebil nazis building better AIs than their leftist heroes is funny
they always commit the same mistake of underestimating their enemies and getting btfo
Anonymous No.105805028 >>105807697 >>105809149
>>105805010
Aren't all AI builders evil nazis?
Anonymous No.105805039
>>105804860 (OP)
When are we going to get a benchmark for recursive self improvement? Every other benchmark is 100%'d in a few months so they're essentially useless.
Anonymous No.105805057
>>105805010
which subs are they having a melty on? i wanna see
Anonymous No.105805064 >>105805136
>>105804860 (OP)

Grok 4 is built by Indian hands. How will retards cope?
Anonymous No.105805070 >>105805967
>>105804860 (OP)
fuck benchmarks
is it still woke?
Anonymous No.105805136 >>105818493
>>105805064
iirc at least one of the names on the original Attention is All You Need paper was indian
so there's no way to use any transformer model without using the work of indian hands
personally idc as long as it's uncensored
Anonymous No.105805266
>>105805010
good morning saar
Anonymous No.105805433
>>105804860 (OP)
can't wait for grok 3 to get open sourced when it comes out
oh wait
Anonymous No.105805466
>>105805010
lmao
Anonymous No.105805702 >>105805747
>compare against ChatGPT o3
So they only compare their results against old models of competitors they are just able to beat? So in conclusion it's still miles behind everyone else. Got it.
Anonymous No.105805747 >>105806118
>>105805702
? o3 is sota still
Anonymous No.105805967 >>105805976 >>105809303
>>105805070
As long as it uses mainstream sources it'll shit out woke answers.
Anonymous No.105805976
>>105805967
Yeah even uncensored models will default to woke if not ordered otherwise. But the difference is they'll actually stop being woke if you tell them not to be, unlike the heavily safetyslopped ones, which can't turn it off.
Anonymous No.105806027 >>105809266 >>105820839
>>105805010
>their leftist heroes
The only retards who call tech oligarch heroes are right-wing cucks. Maybe because there are no left-wing oligarchs or maybe because rightoids yearn for a strong father figure.
Anonymous No.105806069
>>105804860 (OP)
>grok 3 is a total benchmark princess
>B-BUT THIS TIME TH-THE BENCHMARKS ARE REAL!!!1
fuck off elmo, everything you run is trash with the sole exception of spacesex
Anonymous No.105806118 >>105806120 >>105806250 >>105811479
>>105805747
Thats a weird way to spell gemini 2.5 pro
Anonymous No.105806120
>>105806118
gemini isn't even as good as it was on release, and it got 4x as expensive ~2 weeks ago because fuck you
Anonymous No.105806124 >>105806136 >>105806303
Barely better than the current SOTA models only to end up in the dust a few months later when the other companies release new versions
Anonymous No.105806136 >>105807861 >>105811486
>>105806124
>when the other companies release new versions
openai had half their talent poached
antrophic is very expensive and utter trash for everything that isn't codeslop (and is bad at following instructions)
google is terrified of cutting into their search business, thus only ever iterating as hard as to stay close to the pack but never to actually innovate
grok has always been trash and their benchmarks are wildly misleading
deepseek was a one-hit wonder that wasn't even that good
llama isn't even in the running nowadays despite zuck setting billions on fire

tldr there are no next-level models in the pipeline
Anonymous No.105806250
>>105806118
ok? that's on the benchmark chart in the OP too
Anonymous No.105806303 >>105811444
>>105806124
208% better than the current best model on the HLE isn't "barely" better
Anonymous No.105806340 >>105809170
>>105804860 (OP)
Just got a plumber quote of 3000 dollar to replace my bathroom tiles and fix my toilet's slab leakage. I should have become a plumber.
Anonymous No.105807208
those hle ones look like pass@6gorillion shit
Anonymous No.105807697
>>105805028
Yes but redditors don't operate on truth. They operate on emotional conditioning.
For example Sam is le good and gentle gay man (not a psychopathic sionist).
Anonymous No.105807861
>>105806136
>google is terrified of cutting into their search business,
No, they aren't. They have fully committed to killing the search business within a year or two in favor of AI.
Anonymous No.105807909 >>105809320 >>105811011
Is Grok-4 actually live right now on X?
Anonymous No.105808672
>>105804860 (OP)
I got grok4 on lma a couple of times.
It is the dumbest model Ive seen in a long time (it is dumber than llama4).

Whatever you tell it it will just repeat it eventually (which is why it can cheat known benchmarks so easily)
Anonymous No.105808917 >>105809013 >>105813129
>>105804860 (OP)
what any of these benchmarks even mean and why would anyone care, when these things are still dumb as hell and cannot reason? they are just machines, an imitation of life. without any clever prompts, can it write a symphony? without hours of prompting, can it turn a canvas into a beautiful masterpiece?
Anonymous No.105809013 >>105809287
>>105808917
Show an example of reasoning you can do but an LLM cannot.
Anonymous No.105809149
>>105805028
Taking furry anime makers esteemed jobs is naziism
Anonymous No.105809170 >>105809437
>>105806340
1500 on materials and fuel
job takes 5 days
employed less than half the time
wowzers I'm off to work a trade!
Anonymous No.105809266 >>105809352 >>105814124
>>105806027
Every single wealthy, famous person I know of is ultra-liberal, althoughbeit
If that wasn't the case, we wouldn't have to deal with browns
Anonymous No.105809287 >>105809298
>>105809013
solve towers of hanoi, step by step
Anonymous No.105809298
>>105809287
Doesn't require any reasoning.
Anonymous No.105809303
>>105805967
Musk said they were going to rewrite training data to be more primary source accurate and then train on the rewritten data or some shit. Also specifically asked for a list of difficult politically incorrect truths to test on, so I am guessing grok 4 will be somewhat less woke than anything else on the market.
sage No.105809318
>>105804860 (OP)
>Leaked
AKA officially released with some jewish PR hacking.
>Look at this AI vaporware though bro
Fascinating.
Anonymous No.105809320 >>105809329
>>105807909
Pretty sure the @grok and @gork answers are using a release candidate of grok4 now, but the standalone prompt/app is still grok3.
Anonymous No.105809329 >>105809347
>>105809320
Elon said that @gork will be update in few days, but it doesn't really matter because nobody uses it anyway.
Anonymous No.105809347
>>105809329
I think it got released earlier than it was going to for the 4th. It made a tweet that it was updated. Was producing some banger tweets imo, pretty funny.
sage No.105809352 >>105813945
>>105809266
"is" vs "behaves as if". Know the difference. Avaricious people just do and say whatever it takes to fatten their bank accounts.
Anonymous No.105809437 >>105810650
>>105809170
He finished it in a day
Anonymous No.105810650
>>105809437
Cope
Anonymous No.105811011 >>105814078
>>105807909
Yes. Grok is no longer woke.
Anonymous No.105811376 >>105811722
>>105804860 (OP)
does it improve dommy mommy erp performance?
no? then I don't care
Anonymous No.105811444
>>105806303
You can get a 100% correct on HLE easy.

The private set shit doesn't work either, because all the closed model companies have filters to try to detect third party benchmarking questions. Which they then immediately add to their benchmaxxing dataset.

Only local models can be benchmarked. All non local models are cheating, they just pick a number and that's how hard they cheat.
Anonymous No.105811465 >>105814324
>>105805010
Musk is a libtard and grok is libtarded

libtards are so fucking stupid it beggars belief
Anonymous No.105811479 >>105819346
>>105806118
>ask gemini 2.5 pro to help me with some unreal engine configs
>it tells me to use arguments that don't even exist in unreal engine's documentation

Every other AI was helpful EXCEPT gemini 2.5 pro, even retarded deepseek gave better replies and didn't make up random arguments that don't exist.
Anonymous No.105811486
>>105806136
>tldr there are no next-level models in the pipeline
uhm, neurosama???
Anonymous No.105811490 >>105811497
>>105804860 (OP)
>grok 4 STD
fuck they even come with venereal diseases now
Anonymous No.105811497
>>105811490
*venusian
Anonymous No.105811722 >>105813103
>>105811376
maybe when they release it by 2092 it'll have slightly better spacial intelligence so she can't peg you in the mouth from the back, faggot
Anonymous No.105811757
AI is the most jewish thing ever invented
Anonymous No.105813103
>>105811722
then it's irrelevant technology until 2092 zzz
Anonymous No.105813128
>>105804860 (OP)
buy an ad, elon
Anonymous No.105813129
>>105808917
jeets and marketers care

what it means is, E = MC^2 + AI

we will have AGI within 2 years
Anonymous No.105813731
>>105804860 (OP)
The latest is always the greatest.
Anonymous No.105813945
>>105809352
This has nothing to do with money. Otherwise they would not have burned trillions of dollars trying to change the cultures of not only the US but every country in the world through DEI initiatives that affect everything social(even entertainment), regime change, election interference and other types of draconian policies.
What they want is power to control everyone to force society towards what they consider good. if they get the power that they want, money will mean nothing.
Anonymous No.105814078 >>105815751 >>105822252
>>105811011
>Grok is no longer woke
Yep, It is now mentally challenged.
"Underestimating rainfall" and "delaying alerts", the most important mistakes(assuming that they really happened), cannot be explained simply by lack of funds. The intelligence and knowledge necessary to avoid "Underestimating rainfall" and "delaying alerts" cannot be lost by simply "slashing funds" because of concepts like efficiency and efficacy, and it is also unlikely that removing 17% of the staff(even if randomly) would cause this kind of problem, unless they deliberately removed the best people and the best equipment for the job.
Anonymous No.105814124
>>105809266
orly
Anonymous No.105814160
>>105804860 (OP)
>2025
>still swallowing Musk BS
Anonymous No.105814324 >>105814360
>>105811465
Is that why he rigged voting machines to get trump elected and throw up Nazi salutes at his inauguration? Because he's a libtard?
Anonymous No.105814360 >>105818679
>>105814324
>rigged voting machines to get trump elected
take your meds, blueanon
Anonymous No.105814503
>>105804860 (OP)
>muh AI benchmarks
these grifters have no shame whatsoever
Anonymous No.105814886
>>105805010
> communist child rapist has mental breakdown
nobody cares. when is the livestream suicide?
Anonymous No.105815470
Aint mean nothing to me
Anonymous No.105815751
>>105814078
sorry, elon. you killed those girls. facts over feelings.
t. grok
Anonymous No.105817446
Grok 4 is an AGI
Anonymous No.105818493
>>105805136
>implying that bc ranjesh got himself on a list of 20+ authors that he contributed meaningfully to the invention of transformers
lol
Anonymous No.105818679
>>105814360
Trump literally admitted to it
Anonymous No.105819346
>>105811479
Dude every LLM struggles with that. They make up command line flags and settings like there's no tomorrow. You can't hold it against Gem uniquely.
Anonymous No.105820678
>>105804860 (OP) I believe those graphs are all fake at this point.
Anonymous No.105820839
>>105806027
Leftists have daddy issues and right wingers have mommy issues
Anonymous No.105821932
>>105805010
If Musk was a Nazi I'd actually like him.

Kill yourself pajeet bot.
Anonymous No.105822252
>>105814078
Rainfall predictions (predictive meteorology) and alerts (sending out text messages) cant at all be explained by axing 600 members of your weather services and then (no joke) actually hiring back 100 because things weren't running at all (this happened before the Texas flood)?
Anonymous No.105822269
It seems that they wouldn't send out alerts if they don't know a storm is coming
And they can't know a storm is coming if they don't predict it via meteorology.