SEAL: LLM That Writes Its Own Updates Solves 72.5% of ARC-AGI Tasks—Up from 0% - /g/ (#105579704) [Archived: 1111 hours ago]

Anonymous
6/13/2025, 9:05:22 AM No.105579704
1654739707464
1654739707464
md5: 1d4027a84fec8e995e8934fc2ddecd79🔍
https://arxiv.org/pdf/2506.10943
Replies: >>105579711 >>105580172 >>105580189 >>105581511 >>105581570 >>105581759 >>105582494 >>105583195 >>105585320 >>105585429 >>105591412 >>105594292
Anonymous
6/13/2025, 9:06:59 AM No.105579711
>>105579704 (OP)
leetcucks in denial
Anonymous
6/13/2025, 10:29:04 AM No.105580172
>>105579704 (OP)
>ARC-AGI Tasks
these are basically impossible for 99.9% of humans btw
Replies: >>105580256
Anonymous
6/13/2025, 10:34:34 AM No.105580189
>>105579704 (OP)
>Limitations:
>While SEAL enables lasting adaptation through self-generated weight updates, our continual learning experiment reveals that repeated self-edits can lead to catastrophic forgetting—performance on earlier tasks degrades as new updates are applied. This suggests that without explicit mechanisms for knowledge retention, self-modification may overwrite valuable prior information. Addressing this remains an open challenge, with potential solutions including replay, constrained updates, or representational superposition.

>Future Work:
>Looking ahead, we envision models that not only adapt their weights but also reason about when and how to adapt, deciding mid-inference whether a self-edit is warranted. Such systems could iteratively distill chain-of-thought traces into weights, transforming ephemeral reasoning into permanent capabilities, and offering a foundation for agentic models that improve continuously through interaction and reflection.
Replies: >>105580207 >>105591060
Anonymous
6/13/2025, 10:36:37 AM No.105580207
>>105580189
poggers
Anonymous
6/13/2025, 10:46:57 AM No.105580256
1749403340264634
1749403340264634
md5: 2d980305339bba0c28e996584011c4a7🔍
>>105580172
>t. LLM
Replies: >>105580259
Anonymous
6/13/2025, 10:47:32 AM No.105580259
>>105580256
have you tried them?
Replies: >>105580275
Anonymous
6/13/2025, 10:51:13 AM No.105580275
arc-agi-2-unsolved-1
arc-agi-2-unsolved-1
md5: bc89a5bf8a8cd2f22c5d5c05e13a7b5d🔍
>>105580259
I can't believe I share the same board with (You). Please go back
Replies: >>105580284 >>105580285 >>105583333
Anonymous
6/13/2025, 10:53:12 AM No.105580284
>>105580275
oh my bad got them mixed up with hle
https://agi.safe.ai/
have a look at the examples on this
Replies: >>105580312
Anonymous
6/13/2025, 10:53:31 AM No.105580285
>>105580275
wtf
this is supposed to be impossible for 99% of humans?
i cant believe i share the same planet as people unable to solve this
Replies: >>105580312
Anonymous
6/13/2025, 10:59:52 AM No.105580312
IMG_20250613_185446
IMG_20250613_185446
md5: ad72aab4b41953a5d6ac96745361bf81🔍
>>105580284
Oh. That makes more sense. Yeah, I looked at those examples, out of the 18 example questions, I think id be able to answer the graph question one, but that's about it. Very hyper specific knowledge required for those.
>>105580285
See picrel
Replies: >>105580340 >>105580346 >>105580363
Anonymous
6/13/2025, 11:04:00 AM No.105580340
>>105580312
17 bucks to solve one of these?
HOW????
AND HOW COMES NOBODY BROKE THE NECK OF THIS PROJECT ALREADY?
HOW IS THIS SUSTAINABLE TO ANY DEGREE?

wtfing hard
who thoght trusting these people with their money is a good idea?
Replies: >>105580346 >>105580543 >>105581626 >>105589796
Anonymous
6/13/2025, 11:05:54 AM No.105580346
>>105580312
>>105580340
wait
>human panel average
>60% score
wtf?
where did they find the humans?
in an insane asylum?
Replies: >>105580400 >>105580443
Anonymous
6/13/2025, 11:08:31 AM No.105580363
>>105580312
also
>HLE
in hle youre testing the dataset of the ML
it should be EXPECTED that a sci oriented llm passes HLE
Anonymous
6/13/2025, 11:14:46 AM No.105580400
>>105580346
>ARC-AGI-1
They used mechanical turk for the first one, so Indians

>ARC-AGI-2
Bunch of randoms who signed up to get paid, so assuming poor goldfish attention students
Replies: >>105580439 >>105580443
Anonymous
6/13/2025, 11:20:35 AM No.105580430
Ai is evolving so fast, it's crazy. I heard that even the finest ai engineers don't fully understand the thing anymore.

https://www.theverge.com/news/684322/meta-scale-ai-15-billion-investment-zuckerberg
Replies: >>105580452 >>105595918
Anonymous
6/13/2025, 11:23:33 AM No.105580439
>>105580400
>Bunch of randoms who signed up to get paid, so assuming poor goldfish attention students
'has to be
>At the core of ARC-AGI benchmark design is the the principle of "Easy for Humans, Hard for AI."
https://arcprize.org/arc-agi
Replies: >>105580443
Anonymous
6/13/2025, 11:24:10 AM No.105580443
>>105580439
>>105580400
>>105580346
i don't think you really get how dumb the average person is
Replies: >>105580451
Anonymous
6/13/2025, 11:24:46 AM No.105580445
Machine learning is not going anywhere. That is why I have learned how to build models from scratch, not just use them in a black box. You must understand linear regression and multivariable calculus. The job market is really bad right now so this is the best time to learn.

Basic neural networks were being discovered in the 80s and 90s, more advanced models that expand on these designs are the subject matter of contemporary research. If you want to future proof your skills, then acquire a scientific level of understanding of the newest technology.

Programming will always be a rare skill, less than 1% of the human population knows how to program computers. It is math skills that differentiate high level programmers.

To put it simply, what these programs do is to tweak a thousand little knobs and dials using partial derivatives to reduce error. The function of error, loss, is derived with respect to every single little variable and then adjusted until the output is at a satisfactory level of accuracy.
Replies: >>105580468 >>105580495 >>105594248 >>105594455
Anonymous
6/13/2025, 11:25:40 AM No.105580451
>>105580443
it stands to reason i dont
i tend to avoid normalspace
thats kinda why im posting on g and not reddit
Replies: >>105580473 >>105595893
Anonymous
6/13/2025, 11:25:45 AM No.105580452
>>105580430
>Ai is evolving so fast, it's crazy
I've been waiting for self feeding AI. The expectation is that it would be suddenly exponential due to how fast it can iterate on itself, 24/7. Which obiously would create a very fast positive feedback loop.
Replies: >>105595902
Anonymous
6/13/2025, 11:27:20 AM No.105580468
>>105580445
this
watch your post be buried among the luddite brainvomit

but know that at least one other anon did read your post and does agree
Replies: >>105580478
Anonymous
6/13/2025, 11:27:56 AM No.105580473
>>105580451
reddit is far far smarter than the average too
Replies: >>105580486 >>105594390
Anonymous
6/13/2025, 11:28:33 AM No.105580478
>>105580468
(corr.)
>does agree
immaterial
its the objective truth
thats literally how the "black box" works
Anonymous
6/13/2025, 11:29:54 AM No.105580486
>>105580473
if thats true
ill start to have compassion for the average human being
their life must be such a burden...
Replies: >>105580501
Anonymous
6/13/2025, 11:31:15 AM No.105580495
>>105580445
This
Anonymous
6/13/2025, 11:31:55 AM No.105580501
>>105580486
maybe now you'll understand why drinking is so common
Replies: >>105580526
Anonymous
6/13/2025, 11:35:32 AM No.105580526
>>105580501
another surprize for me.
i sometimes drink heavily because my life is fucked up in some aspects
i thought normied dont drink much nowadays

but again, i dont have truly normalfag friends
the closest thing to a normalfag friend i have, is an albanese jurist who had a schizophreniac episode, who steals phones and wallets when he gets drunk and now is mouting a cultural revolution in kosovo
Replies: >>105585341
Anonymous
6/13/2025, 11:39:02 AM No.105580543
>>105580340
It isn't. The entire thing works on the US government burning billions of dollars every day.
Replies: >>105580554
Anonymous
6/13/2025, 11:40:48 AM No.105580554
>>105580543
as usual, the taxpayer foots the bill
a fool and his money...
>inb4 but you also pay taxes
yeah but if everyone thought like me that wouldnt be a status quo for very long
Anonymous
6/13/2025, 2:37:02 PM No.105581511
>>105579704 (OP)
>LLM eats its own shit
>It's so over
For the LLM, yes
Anonymous
6/13/2025, 2:45:23 PM No.105581570
>>105579704 (OP)
they aren't solving anything. the model already contains the answer. that's cheating
Anonymous
6/13/2025, 2:53:18 PM No.105581626
>>105580340
>who thoght trusting these people with their money is a good idea?

Just like people trusted this indian scam:

>Why did Microsoft-backed $1.3bn Builder.ai Indian SCAM?
https://www.financialexpress.com/business/start-ups/why-did-microsoft-backed-1-3bn-builderai-collapse-accused-of-using-indian-codersforaiwork/3854944/

>True Story Behind Builder.ai Crash
>The “AI” Was Mostly Human
>Investigation revealed 85% of delivered apps were built by underpaid developers in India and Eastern Europe
>“Natasha” AI was essentially an overglorified project management tool
>Company was burning $18M/month
https://washingtonmorning.com/2025/05/28/true-story-behind-builder-ai-crash/

>Indian AI Startup Worth Billions Turns Out to Be Biggest Scam Ever
https://propakistani.pk/2025/05/29/indian-ai-startup-worth-billions-turns-out-to-be-biggest-scam-ever/

Including half a billion $ investment money from Microsoft.
>why do I get this feeling that pajeets working for MS were covering while stealing money?
Anonymous
6/13/2025, 3:13:27 PM No.105581759
1749684219936825
1749684219936825
md5: 45fdbefd1007381c67504ddf50688758🔍
>>105579704 (OP)
>AI will replace the average programmer
sure, here's a midwit problem for your AI to solve
>no response
You all AI fags are the same, sperging out over AI being able to solve CS student level problems (using copy/paste magic) while vanishing when someone actually posts a midwit problem, for the AI to solve, that wasn't scrapped from Github yet
Anonymous
6/13/2025, 5:03:27 PM No.105582494
>>105579704 (OP)
Fuck you and fuck your soijak thread
Anonymous
6/13/2025, 6:34:27 PM No.105583195
>>105579704 (OP)
okay but how is it at porn?
i dont care unless it can suck the cum straight out of my dick
Anonymous
6/13/2025, 6:54:03 PM No.105583333
1724986725769545
1724986725769545
md5: ce06b81b52147273b2ed3eaa74d2f3e8🔍
>>105580275
What do I color the rest? Do I have to extrapolate, is there some other pattern I haven't picked up on, or do I just leave them uncolored?
t. actual midwit that likes puzzles
Replies: >>105583571 >>105585293
Anonymous
6/13/2025, 7:23:07 PM No.105583571
1734048600122146
1734048600122146
md5: 5a138e8363a31a1f88961f9de8c921e1🔍
>>105583333
idk I'm just gonna color them red and green because those were the colors for those shapes in the previous question
Replies: >>105585293
Anonymous
6/13/2025, 10:42:16 PM No.105585293
>>105583333
>>105583571
You are part of the 99%.
Look at the examples again.
Replies: >>105585408
Anonymous
6/13/2025, 10:46:16 PM No.105585320
>>105579704 (OP)
Self benchmaxxing benchmaxxer.
Anonymous
6/13/2025, 10:48:34 PM No.105585341
>>105580526
>i thought normied dont drink much nowadays
I've read that zoomers drink less
Anonymous
6/13/2025, 10:56:56 PM No.105585408
1728353425936472
1728353425936472
md5: 5106d850b021c448dbdfcabcc0ea334d🔍
>>105585293
I accidentally filled one in the wrong color but other than that it seems consistent with the pattern I noticed.
Is it actually wrong or are you just fucking with me? Post your answer
Replies: >>105585428
Anonymous
6/13/2025, 10:58:45 PM No.105585428
>>105585408
Yeah it's wrong, specifically the zero-hole and one-hole ones are wrong.
Again, look carefully through the two provided examples. You're being retarded anon.
Replies: >>105585503
Anonymous
6/13/2025, 10:58:49 PM No.105585429
>>105579704 (OP)
>72.5% of ARC-AGI Tasks
Wow, more meaningless drivel nothing benchmarks directly from the people who make this garbage.
Anonymous
6/13/2025, 11:06:12 PM No.105585503
1735840900381857
1735840900381857
md5: 7a1290366a171b2d9ce4eda85ced826d🔍
>>105585428
I got it now and feel really dumb for not getting it sooner, it couldn't have been more obvious in the first example
Replies: >>105585541
Anonymous
6/13/2025, 11:10:28 PM No.105585541
>>105585503
Nice anon I knew you could do it
Replies: >>105585556
Anonymous
6/13/2025, 11:11:58 PM No.105585556
1742695079561145
1742695079561145
md5: e0507b54d59d7599c411c1606e1d8bf4🔍
>>105585541
A true gentleman leaves no puzzle unsolved!
Anonymous
6/14/2025, 4:20:35 AM No.105587778
you fucks are playing god
Replies: >>105590035 >>105590048
Anonymous
6/14/2025, 10:53:40 AM No.105589796
>>105580340

lol at 17 bucks

they paid some people some money to solve the test once, big deal

like what, 1700 total if there are a 100 tasks

there are billions going into ai research
Anonymous
6/14/2025, 11:46:00 AM No.105590035
>>105587778
SOMEONE'S gotta do it since the old god stopped playing
Replies: >>105593435
Anonymous
6/14/2025, 11:48:12 AM No.105590048
>>105587778
reddit comment of the year
Anonymous
6/14/2025, 2:47:05 PM No.105591060
>>105580189

Of course. Updates has to be consistent with what is already known to be "out there", and this is the fundamental problem unsolvable by mere throwing random shit at the wall.

Math and logic evolved for millennials and can be meaningfully applied only in very strict and narrow contexts

In short, their shit will hallucinate a subtle, hard to spot bullshit, just like proofs by retards.
Anonymous
6/14/2025, 3:47:22 PM No.105591412
>>105579704 (OP)
Can it beat Pokémon red
Anonymous
6/14/2025, 7:58:08 PM No.105593435
>>105590035
can't wait for the datacenter of babel update
Anonymous
6/14/2025, 9:39:05 PM No.105594248
>>105580445
While I don't disagree with your post in general, you have to admit that fundamentals become useless quickly and are only required from a handful of people. Yeah you could know how a processor works to optimize inference, but what's the point if libraries and compilers got that figured out.
Anonymous
6/14/2025, 9:44:34 PM No.105594292
>>105579704 (OP)
>Solves 72.5% of ARC-AGI Tasks
Wow, more made up, ad-hoc benchmarks!
I'm legit spooped!
Anonymous
6/14/2025, 9:59:25 PM No.105594390
>>105580473
chat, is this real?
Anonymous
6/14/2025, 10:07:37 PM No.105594455
>>105580445
this reads like LLM slop
Anonymous
6/15/2025, 12:49:51 AM No.105595893
>>105580451
this place is for pseuds lmao
Anonymous
6/15/2025, 12:50:52 AM No.105595902
>>105580452
you know what also happens with feedback loops?
you should be able to solve this
Anonymous
6/15/2025, 12:53:19 AM No.105595918
>>105580430
What's dangerous about AI is how small the time between an overhyped useless tool and something that totally consumes the observable universe to calculate some useless task can be.