Thread 105804860 - /g/ [Archived: 464 hours ago]

Anonymous
7/5/2025, 4:59:03 AM No.105804860
20250705_055817
20250705_055817
md5: f543b27c0903a07e61d101f58bf5fe08🔍
These leaked Grok-4 benchmark results look fucking nuts.
Replies: >>105805004 >>105805010 >>105805039 >>105805064 >>105805070 >>105805433 >>105806069 >>105806340 >>105808672 >>105808917 >>105809318 >>105811376 >>105811490 >>105813128 >>105813731 >>105814160 >>105814503 >>105820678
Anonymous
7/5/2025, 5:24:18 AM No.105804976
more like cock-4 lmao
Anonymous
7/5/2025, 5:28:14 AM No.105805004
>>105804860 (OP)
>new benchmarks come out
>train on new benchmarks
>does better at new benchmarks
Anonymous
7/5/2025, 5:29:10 AM No.105805010
>>105804860 (OP)
watching redditors melt down over le ebil nazis building better AIs than their leftist heroes is funny
they always commit the same mistake of underestimating their enemies and getting btfo
Replies: >>105805028 >>105805057 >>105805266 >>105805466 >>105806027 >>105811465 >>105814886 >>105821932
Anonymous
7/5/2025, 5:31:28 AM No.105805028
>>105805010
Aren't all AI builders evil nazis?
Replies: >>105807697 >>105809149
Anonymous
7/5/2025, 5:32:33 AM No.105805039
>>105804860 (OP)
When are we going to get a benchmark for recursive self improvement? Every other benchmark is 100%'d in a few months so they're essentially useless.
Anonymous
7/5/2025, 5:35:52 AM No.105805057
>>105805010
which subs are they having a melty on? i wanna see
Anonymous
7/5/2025, 5:37:48 AM No.105805064
>>105804860 (OP)

Grok 4 is built by Indian hands. How will retards cope?
Replies: >>105805136
Anonymous
7/5/2025, 5:40:17 AM No.105805070
>>105804860 (OP)
fuck benchmarks
is it still woke?
Replies: >>105805967
Anonymous
7/5/2025, 5:50:43 AM No.105805136
>>105805064
iirc at least one of the names on the original Attention is All You Need paper was indian
so there's no way to use any transformer model without using the work of indian hands
personally idc as long as it's uncensored
Replies: >>105818493
Anonymous
7/5/2025, 6:15:18 AM No.105805266
>>105805010
good morning saar
Anonymous
7/5/2025, 6:52:53 AM No.105805433
>>105804860 (OP)
can't wait for grok 3 to get open sourced when it comes out
oh wait
Anonymous
7/5/2025, 7:00:43 AM No.105805466
1751691625347
1751691625347
md5: 8f3cefef72af69dc454c1e0814088baa🔍
>>105805010
lmao
Anonymous
7/5/2025, 7:41:17 AM No.105805702
>compare against ChatGPT o3
So they only compare their results against old models of competitors they are just able to beat? So in conclusion it's still miles behind everyone else. Got it.
Replies: >>105805747
Anonymous
7/5/2025, 7:49:29 AM No.105805747
>>105805702
? o3 is sota still
Replies: >>105806118
Anonymous
7/5/2025, 8:33:24 AM No.105805967
>>105805070
As long as it uses mainstream sources it'll shit out woke answers.
Replies: >>105805976 >>105809303
Anonymous
7/5/2025, 8:35:30 AM No.105805976
>>105805967
Yeah even uncensored models will default to woke if not ordered otherwise. But the difference is they'll actually stop being woke if you tell them not to be, unlike the heavily safetyslopped ones, which can't turn it off.
Anonymous
7/5/2025, 8:45:29 AM No.105806027
>>105805010
>their leftist heroes
The only retards who call tech oligarch heroes are right-wing cucks. Maybe because there are no left-wing oligarchs or maybe because rightoids yearn for a strong father figure.
Replies: >>105809266 >>105820839
Anonymous
7/5/2025, 8:53:28 AM No.105806069
>>105804860 (OP)
>grok 3 is a total benchmark princess
>B-BUT THIS TIME TH-THE BENCHMARKS ARE REAL!!!1
fuck off elmo, everything you run is trash with the sole exception of spacesex
Anonymous
7/5/2025, 9:02:09 AM No.105806118
>>105805747
Thats a weird way to spell gemini 2.5 pro
Replies: >>105806120 >>105806250 >>105811479
Anonymous
7/5/2025, 9:03:21 AM No.105806120
>>105806118
gemini isn't even as good as it was on release, and it got 4x as expensive ~2 weeks ago because fuck you
Anonymous
7/5/2025, 9:04:07 AM No.105806124
Barely better than the current SOTA models only to end up in the dust a few months later when the other companies release new versions
Replies: >>105806136 >>105806303
Anonymous
7/5/2025, 9:08:11 AM No.105806136
>>105806124
>when the other companies release new versions
openai had half their talent poached
antrophic is very expensive and utter trash for everything that isn't codeslop (and is bad at following instructions)
google is terrified of cutting into their search business, thus only ever iterating as hard as to stay close to the pack but never to actually innovate
grok has always been trash and their benchmarks are wildly misleading
deepseek was a one-hit wonder that wasn't even that good
llama isn't even in the running nowadays despite zuck setting billions on fire

tldr there are no next-level models in the pipeline
Replies: >>105807861 >>105811486
Anonymous
7/5/2025, 9:34:43 AM No.105806250
>>105806118
ok? that's on the benchmark chart in the OP too
Anonymous
7/5/2025, 9:43:31 AM No.105806303
>>105806124
208% better than the current best model on the HLE isn't "barely" better
Replies: >>105811444
Anonymous
7/5/2025, 9:50:19 AM No.105806340
>>105804860 (OP)
Just got a plumber quote of 3000 dollar to replace my bathroom tiles and fix my toilet's slab leakage. I should have become a plumber.
Replies: >>105809170
Anonymous
7/5/2025, 12:54:20 PM No.105807208
those hle ones look like pass@6gorillion shit
Anonymous
7/5/2025, 2:25:17 PM No.105807697
>>105805028
Yes but redditors don't operate on truth. They operate on emotional conditioning.
For example Sam is le good and gentle gay man (not a psychopathic sionist).
Anonymous
7/5/2025, 2:54:37 PM No.105807861
>>105806136
>google is terrified of cutting into their search business,
No, they aren't. They have fully committed to killing the search business within a year or two in favor of AI.
Anonymous
7/5/2025, 3:01:19 PM No.105807909
Screenshot_20250705_160051_X
Screenshot_20250705_160051_X
md5: f3956306f09964952381ab4d63c2e549🔍
Is Grok-4 actually live right now on X?
Replies: >>105809320 >>105811011
Anonymous
7/5/2025, 5:03:41 PM No.105808672
>>105804860 (OP)
I got grok4 on lma a couple of times.
It is the dumbest model Ive seen in a long time (it is dumber than llama4).

Whatever you tell it it will just repeat it eventually (which is why it can cheat known benchmarks so easily)
Anonymous
7/5/2025, 5:38:39 PM No.105808917
5476345645745844
5476345645745844
md5: 7e7e9f3a24e3f0f2e401f7a14e603231🔍
>>105804860 (OP)
what any of these benchmarks even mean and why would anyone care, when these things are still dumb as hell and cannot reason? they are just machines, an imitation of life. without any clever prompts, can it write a symphony? without hours of prompting, can it turn a canvas into a beautiful masterpiece?
Replies: >>105809013 >>105813129
Anonymous
7/5/2025, 5:47:14 PM No.105809013
>>105808917
Show an example of reasoning you can do but an LLM cannot.
Replies: >>105809287
Anonymous
7/5/2025, 6:05:54 PM No.105809149
>>105805028
Taking furry anime makers esteemed jobs is naziism
Anonymous
7/5/2025, 6:08:04 PM No.105809170
>>105806340
1500 on materials and fuel
job takes 5 days
employed less than half the time
wowzers I'm off to work a trade!
Replies: >>105809437
Anonymous
7/5/2025, 6:22:12 PM No.105809266
>>105806027
Every single wealthy, famous person I know of is ultra-liberal, althoughbeit
If that wasn't the case, we wouldn't have to deal with browns
Replies: >>105809352 >>105814124
Anonymous
7/5/2025, 6:26:03 PM No.105809287
>>105809013
solve towers of hanoi, step by step
Replies: >>105809298
Anonymous
7/5/2025, 6:27:23 PM No.105809298
>>105809287
Doesn't require any reasoning.
Anonymous
7/5/2025, 6:28:18 PM No.105809303
>>105805967
Musk said they were going to rewrite training data to be more primary source accurate and then train on the rewritten data or some shit. Also specifically asked for a list of difficult politically incorrect truths to test on, so I am guessing grok 4 will be somewhat less woke than anything else on the market.
sage
7/5/2025, 6:30:52 PM No.105809318
>>105804860 (OP)
>Leaked
AKA officially released with some jewish PR hacking.
>Look at this AI vaporware though bro
Fascinating.
Anonymous
7/5/2025, 6:31:03 PM No.105809320
>>105807909
Pretty sure the @grok and @gork answers are using a release candidate of grok4 now, but the standalone prompt/app is still grok3.
Replies: >>105809329
Anonymous
7/5/2025, 6:32:35 PM No.105809329
>>105809320
Elon said that @gork will be update in few days, but it doesn't really matter because nobody uses it anyway.
Replies: >>105809347
Anonymous
7/5/2025, 6:35:14 PM No.105809347
>>105809329
I think it got released earlier than it was going to for the 4th. It made a tweet that it was updated. Was producing some banger tweets imo, pretty funny.
sage
7/5/2025, 6:35:46 PM No.105809352
1638848843137
1638848843137
md5: 7baae1bd66a9311910517476b6c2944c🔍
>>105809266
"is" vs "behaves as if". Know the difference. Avaricious people just do and say whatever it takes to fatten their bank accounts.
Replies: >>105813945
Anonymous
7/5/2025, 6:48:08 PM No.105809437
>>105809170
He finished it in a day
Replies: >>105810650
Anonymous
7/5/2025, 9:09:10 PM No.105810650
>>105809437
Cope
Anonymous
7/5/2025, 9:56:39 PM No.105811011
1000014996
1000014996
md5: 690c55a47728835d9d9210857af1ea35🔍
>>105807909
Yes. Grok is no longer woke.
Replies: >>105814078
Anonymous
7/5/2025, 10:48:14 PM No.105811376
GsJxTrnWkAAa_kG
GsJxTrnWkAAa_kG
md5: 9e60d837a7499a1d56400aa22948e453🔍
>>105804860 (OP)
does it improve dommy mommy erp performance?
no? then I don't care
Replies: >>105811722
Anonymous
7/5/2025, 11:00:50 PM No.105811444
>>105806303
You can get a 100% correct on HLE easy.

The private set shit doesn't work either, because all the closed model companies have filters to try to detect third party benchmarking questions. Which they then immediately add to their benchmaxxing dataset.

Only local models can be benchmarked. All non local models are cheating, they just pick a number and that's how hard they cheat.
Anonymous
7/5/2025, 11:03:04 PM No.105811465
>>105805010
Musk is a libtard and grok is libtarded

libtards are so fucking stupid it beggars belief
Replies: >>105814324
Anonymous
7/5/2025, 11:04:51 PM No.105811479
>>105806118
>ask gemini 2.5 pro to help me with some unreal engine configs
>it tells me to use arguments that don't even exist in unreal engine's documentation

Every other AI was helpful EXCEPT gemini 2.5 pro, even retarded deepseek gave better replies and didn't make up random arguments that don't exist.
Replies: >>105819346
Anonymous
7/5/2025, 11:05:52 PM No.105811486
>>105806136
>tldr there are no next-level models in the pipeline
uhm, neurosama???
Anonymous
7/5/2025, 11:06:17 PM No.105811490
>>105804860 (OP)
>grok 4 STD
fuck they even come with venereal diseases now
Replies: >>105811497
Anonymous
7/5/2025, 11:07:18 PM No.105811497
>>105811490
*venusian
Anonymous
7/5/2025, 11:42:52 PM No.105811722
>>105811376
maybe when they release it by 2092 it'll have slightly better spacial intelligence so she can't peg you in the mouth from the back, faggot
Replies: >>105813103
Anonymous
7/5/2025, 11:47:08 PM No.105811757
AI is the most jewish thing ever invented
Anonymous
7/6/2025, 3:21:10 AM No.105813103
>>105811722
then it's irrelevant technology until 2092 zzz
Anonymous
7/6/2025, 3:24:12 AM No.105813128
>>105804860 (OP)
buy an ad, elon
Anonymous
7/6/2025, 3:24:41 AM No.105813129
>>105808917
jeets and marketers care

what it means is, E = MC^2 + AI

we will have AGI within 2 years
Anonymous
7/6/2025, 5:20:45 AM No.105813731
>>105804860 (OP)
The latest is always the greatest.
Anonymous
7/6/2025, 6:03:12 AM No.105813945
>>105809352
This has nothing to do with money. Otherwise they would not have burned trillions of dollars trying to change the cultures of not only the US but every country in the world through DEI initiatives that affect everything social(even entertainment), regime change, election interference and other types of draconian policies.
What they want is power to control everyone to force society towards what they consider good. if they get the power that they want, money will mean nothing.
Anonymous
7/6/2025, 6:27:41 AM No.105814078
>>105811011
>Grok is no longer woke
Yep, It is now mentally challenged.
"Underestimating rainfall" and "delaying alerts", the most important mistakes(assuming that they really happened), cannot be explained simply by lack of funds. The intelligence and knowledge necessary to avoid "Underestimating rainfall" and "delaying alerts" cannot be lost by simply "slashing funds" because of concepts like efficiency and efficacy, and it is also unlikely that removing 17% of the staff(even if randomly) would cause this kind of problem, unless they deliberately removed the best people and the best equipment for the job.
Replies: >>105815751 >>105822252
Anonymous
7/6/2025, 6:33:33 AM No.105814124
>>105809266
orly
Anonymous
7/6/2025, 6:40:25 AM No.105814160
>>105804860 (OP)
>2025
>still swallowing Musk BS
Anonymous
7/6/2025, 7:12:48 AM No.105814324
>>105811465
Is that why he rigged voting machines to get trump elected and throw up Nazi salutes at his inauguration? Because he's a libtard?
Replies: >>105814360
Anonymous
7/6/2025, 7:20:33 AM No.105814360
>>105814324
>rigged voting machines to get trump elected
take your meds, blueanon
Replies: >>105818679
Anonymous
7/6/2025, 7:52:19 AM No.105814503
>>105804860 (OP)
>muh AI benchmarks
these grifters have no shame whatsoever
Anonymous
7/6/2025, 9:01:11 AM No.105814886
>>105805010
> communist child rapist has mental breakdown
nobody cares. when is the livestream suicide?
Anonymous
7/6/2025, 10:51:36 AM No.105815470
Aint mean nothing to me
Anonymous
7/6/2025, 11:43:56 AM No.105815751
>>105814078
sorry, elon. you killed those girls. facts over feelings.
t. grok
Anonymous
7/6/2025, 4:20:16 PM No.105817446
Grok 4 is an AGI
Anonymous
7/6/2025, 6:18:47 PM No.105818493
>>105805136
>implying that bc ranjesh got himself on a list of 20+ authors that he contributed meaningfully to the invention of transformers
lol
Anonymous
7/6/2025, 6:42:25 PM No.105818679
>>105814360
Trump literally admitted to it
Anonymous
7/6/2025, 7:54:52 PM No.105819346
>>105811479
Dude every LLM struggles with that. They make up command line flags and settings like there's no tomorrow. You can't hold it against Gem uniquely.
Anonymous
7/6/2025, 10:30:31 PM No.105820678
>>105804860 (OP) I believe those graphs are all fake at this point.
Anonymous
7/6/2025, 10:49:31 PM No.105820839
>>105806027
Leftists have daddy issues and right wingers have mommy issues
Anonymous
7/7/2025, 12:55:11 AM No.105821932
>>105805010
If Musk was a Nazi I'd actually like him.

Kill yourself pajeet bot.
Anonymous
7/7/2025, 1:40:26 AM No.105822252
>>105814078
Rainfall predictions (predictive meteorology) and alerts (sending out text messages) cant at all be explained by axing 600 members of your weather services and then (no joke) actually hiring back 100 because things weren't running at all (this happened before the Texas flood)?
Anonymous
7/7/2025, 1:43:06 AM No.105822269
It seems that they wouldn't send out alerts if they don't know a storm is coming
And they can't know a storm is coming if they don't predict it via meteorology.