SEAL: LLM That Writes Its Own Updates Solves 72.5% of ARC-AGI Tasks—Up from 0% - /g/ (#105579704) [Archived: 1111 hours ago]

Anonymous

6/13/2025, 9:05:22 AM No.105579704

md5: 1d4027a84fec8e995e8934fc2ddecd79🔍

https://arxiv.org/pdf/2506.10943

Replies: >>105579711 >>105580172 >>105580189 >>105581511 >>105581570 >>105581759 >>105582494 >>105583195 >>105585320 >>105585429 >>105591412 >>105594292

Anonymous

6/13/2025, 9:06:59 AM No.105579711

>>105579704 (OP)
leetcucks in denial

Anonymous

6/13/2025, 10:29:04 AM No.105580172

>>105579704 (OP)
>ARC-AGI Tasks
these are basically impossible for 99.9% of humans btw

Replies: >>105580256

Anonymous

6/13/2025, 10:34:34 AM No.105580189

>>105579704 (OP)
>Limitations:
>While SEAL enables lasting adaptation through self-generated weight updates, our continual learning experiment reveals that repeated self-edits can lead to catastrophic forgetting—performance on earlier tasks degrades as new updates are applied. This suggests that without explicit mechanisms for knowledge retention, self-modification may overwrite valuable prior information. Addressing this remains an open challenge, with potential solutions including replay, constrained updates, or representational superposition.

>Future Work:
>Looking ahead, we envision models that not only adapt their weights but also reason about when and how to adapt, deciding mid-inference whether a self-edit is warranted. Such systems could iteratively distill chain-of-thought traces into weights, transforming ephemeral reasoning into permanent capabilities, and offering a foundation for agentic models that improve continuously through interaction and reflection.

Replies: >>105580207 >>105591060

Anonymous

6/13/2025, 10:36:37 AM No.105580207

>>105580189
poggers

Anonymous

6/13/2025, 10:46:57 AM No.105580256

1749403340264634

md5: 2d980305339bba0c28e996584011c4a7🔍

>>105580172
>t. LLM

Replies: >>105580259

Anonymous

6/13/2025, 10:47:32 AM No.105580259

>>105580256
have you tried them?

Replies: >>105580275

Anonymous

6/13/2025, 10:51:13 AM No.105580275

arc-agi-2-unsolved-1

md5: bc89a5bf8a8cd2f22c5d5c05e13a7b5d🔍

>>105580259
I can't believe I share the same board with (You). Please go back

Replies: >>105580284 >>105580285 >>105583333

Anonymous

6/13/2025, 10:53:12 AM No.105580284

>>105580275
oh my bad got them mixed up with hle
https://agi.safe.ai/
have a look at the examples on this

Replies: >>105580312

Anonymous

6/13/2025, 10:53:31 AM No.105580285

>>105580275
wtf
this is supposed to be impossible for 99% of humans?
i cant believe i share the same planet as people unable to solve this

Replies: >>105580312

Anonymous

6/13/2025, 10:59:52 AM No.105580312

IMG_20250613_185446

md5: ad72aab4b41953a5d6ac96745361bf81🔍

>>105580284
Oh. That makes more sense. Yeah, I looked at those examples, out of the 18 example questions, I think id be able to answer the graph question one, but that's about it. Very hyper specific knowledge required for those.
>>105580285
See picrel

Replies: >>105580340 >>105580346 >>105580363

Anonymous

6/13/2025, 11:04:00 AM No.105580340

>>105580312
17 bucks to solve one of these?
HOW????
AND HOW COMES NOBODY BROKE THE NECK OF THIS PROJECT ALREADY?
HOW IS THIS SUSTAINABLE TO ANY DEGREE?

wtfing hard
who thoght trusting these people with their money is a good idea?

Replies: >>105580346 >>105580543 >>105581626 >>105589796

Anonymous

6/13/2025, 11:05:54 AM No.105580346

>>105580312
>>105580340
wait
>human panel average
>60% score
wtf?
where did they find the humans?
in an insane asylum?

Replies: >>105580400 >>105580443

Anonymous

6/13/2025, 11:08:31 AM No.105580363

>>105580312
also
>HLE
in hle youre testing the dataset of the ML
it should be EXPECTED that a sci oriented llm passes HLE

Anonymous

6/13/2025, 11:14:46 AM No.105580400

>>105580346
>ARC-AGI-1
They used mechanical turk for the first one, so Indians

>ARC-AGI-2
Bunch of randoms who signed up to get paid, so assuming poor goldfish attention students

Replies: >>105580439 >>105580443

Anonymous

6/13/2025, 11:20:35 AM No.105580430

Ai is evolving so fast, it's crazy. I heard that even the finest ai engineers don't fully understand the thing anymore.

https://www.theverge.com/news/684322/meta-scale-ai-15-billion-investment-zuckerberg

Replies: >>105580452 >>105595918

Anonymous

6/13/2025, 11:23:33 AM No.105580439

>>105580400
>Bunch of randoms who signed up to get paid, so assuming poor goldfish attention students
'has to be
>At the core of ARC-AGI benchmark design is the the principle of "Easy for Humans, Hard for AI."
https://arcprize.org/arc-agi

Replies: >>105580443

Anonymous

6/13/2025, 11:24:10 AM No.105580443

>>105580439
>>105580400
>>105580346
i don't think you really get how dumb the average person is

Replies: >>105580451

Anonymous

6/13/2025, 11:24:46 AM No.105580445

Machine learning is not going anywhere. That is why I have learned how to build models from scratch, not just use them in a black box. You must understand linear regression and multivariable calculus. The job market is really bad right now so this is the best time to learn.

Basic neural networks were being discovered in the 80s and 90s, more advanced models that expand on these designs are the subject matter of contemporary research. If you want to future proof your skills, then acquire a scientific level of understanding of the newest technology.

Programming will always be a rare skill, less than 1% of the human population knows how to program computers. It is math skills that differentiate high level programmers.

To put it simply, what these programs do is to tweak a thousand little knobs and dials using partial derivatives to reduce error. The function of error, loss, is derived with respect to every single little variable and then adjusted until the output is at a satisfactory level of accuracy.

Replies: >>105580468 >>105580495 >>105594248 >>105594455

Anonymous

6/13/2025, 11:25:40 AM No.105580451

>>105580443
it stands to reason i dont
i tend to avoid normalspace
thats kinda why im posting on g and not reddit

Replies: >>105580473 >>105595893

Anonymous

6/13/2025, 11:25:45 AM No.105580452

>>105580430
>Ai is evolving so fast, it's crazy
I've been waiting for self feeding AI. The expectation is that it would be suddenly exponential due to how fast it can iterate on itself, 24/7. Which obiously would create a very fast positive feedback loop.

Replies: >>105595902

Anonymous

6/13/2025, 11:27:20 AM No.105580468

>>105580445
this
watch your post be buried among the luddite brainvomit

but know that at least one other anon did read your post and does agree

Replies: >>105580478

Anonymous

6/13/2025, 11:27:56 AM No.105580473

>>105580451
reddit is far far smarter than the average too

Replies: >>105580486 >>105594390

Anonymous

6/13/2025, 11:28:33 AM No.105580478

>>105580468
(corr.)
>does agree
immaterial
its the objective truth
thats literally how the "black box" works

Anonymous

6/13/2025, 11:29:54 AM No.105580486

>>105580473
if thats true
ill start to have compassion for the average human being
their life must be such a burden...

Replies: >>105580501

Anonymous

6/13/2025, 11:31:15 AM No.105580495

>>105580445
This

Anonymous

6/13/2025, 11:31:55 AM No.105580501

>>105580486
maybe now you'll understand why drinking is so common

Replies: >>105580526

Anonymous

6/13/2025, 11:35:32 AM No.105580526

>>105580501
another surprize for me.
i sometimes drink heavily because my life is fucked up in some aspects
i thought normied dont drink much nowadays

but again, i dont have truly normalfag friends
the closest thing to a normalfag friend i have, is an albanese jurist who had a schizophreniac episode, who steals phones and wallets when he gets drunk and now is mouting a cultural revolution in kosovo

Replies: >>105585341

Anonymous

6/13/2025, 11:39:02 AM No.105580543

>>105580340
It isn't. The entire thing works on the US government burning billions of dollars every day.

Replies: >>105580554

Anonymous

6/13/2025, 11:40:48 AM No.105580554

>>105580543
as usual, the taxpayer foots the bill
a fool and his money...
>inb4 but you also pay taxes
yeah but if everyone thought like me that wouldnt be a status quo for very long

Anonymous

6/13/2025, 2:37:02 PM No.105581511

>>105579704 (OP)
>LLM eats its own shit
>It's so over
For the LLM, yes

Anonymous

6/13/2025, 2:45:23 PM No.105581570

>>105579704 (OP)
they aren't solving anything. the model already contains the answer. that's cheating

Anonymous

6/13/2025, 2:53:18 PM No.105581626

>>105580340
>who thoght trusting these people with their money is a good idea?

Just like people trusted this indian scam:

>Why did Microsoft-backed $1.3bn Builder.ai Indian SCAM?
https://www.financialexpress.com/business/start-ups/why-did-microsoft-backed-1-3bn-builderai-collapse-accused-of-using-indian-codersforaiwork/3854944/

>True Story Behind Builder.ai Crash
>The “AI” Was Mostly Human
>Investigation revealed 85% of delivered apps were built by underpaid developers in India and Eastern Europe
>“Natasha” AI was essentially an overglorified project management tool
>Company was burning $18M/month
https://washingtonmorning.com/2025/05/28/true-story-behind-builder-ai-crash/

>Indian AI Startup Worth Billions Turns Out to Be Biggest Scam Ever
https://propakistani.pk/2025/05/29/indian-ai-startup-worth-billions-turns-out-to-be-biggest-scam-ever/

Including half a billion $ investment money from Microsoft.
>why do I get this feeling that pajeets working for MS were covering while stealing money?

Anonymous

6/13/2025, 3:13:27 PM No.105581759

1749684219936825

md5: 45fdbefd1007381c67504ddf50688758🔍

>>105579704 (OP)
>AI will replace the average programmer
sure, here's a midwit problem for your AI to solve
>no response
You all AI fags are the same, sperging out over AI being able to solve CS student level problems (using copy/paste magic) while vanishing when someone actually posts a midwit problem, for the AI to solve, that wasn't scrapped from Github yet

Anonymous

6/13/2025, 5:03:27 PM No.105582494

>>105579704 (OP)
Fuck you and fuck your soijak thread

Anonymous

6/13/2025, 6:34:27 PM No.105583195

>>105579704 (OP)
okay but how is it at porn?
i dont care unless it can suck the cum straight out of my dick

Anonymous

6/13/2025, 6:54:03 PM No.105583333

1724986725769545

md5: ce06b81b52147273b2ed3eaa74d2f3e8🔍

>>105580275
What do I color the rest? Do I have to extrapolate, is there some other pattern I haven't picked up on, or do I just leave them uncolored?
t. actual midwit that likes puzzles

Replies: >>105583571 >>105585293

Anonymous

6/13/2025, 7:23:07 PM No.105583571

1734048600122146

md5: 5a138e8363a31a1f88961f9de8c921e1🔍

>>105583333
idk I'm just gonna color them red and green because those were the colors for those shapes in the previous question

Replies: >>105585293

Anonymous

6/13/2025, 10:42:16 PM No.105585293

>>105583333
>>105583571
You are part of the 99%.
Look at the examples again.

Replies: >>105585408

Anonymous

6/13/2025, 10:46:16 PM No.105585320

>>105579704 (OP)
Self benchmaxxing benchmaxxer.

Anonymous

6/13/2025, 10:48:34 PM No.105585341

>>105580526
>i thought normied dont drink much nowadays
I've read that zoomers drink less

Anonymous

6/13/2025, 10:56:56 PM No.105585408

1728353425936472

md5: 5106d850b021c448dbdfcabcc0ea334d🔍

>>105585293
I accidentally filled one in the wrong color but other than that it seems consistent with the pattern I noticed.
Is it actually wrong or are you just fucking with me? Post your answer

Replies: >>105585428

Anonymous

6/13/2025, 10:58:45 PM No.105585428

>>105585408
Yeah it's wrong, specifically the zero-hole and one-hole ones are wrong.
Again, look carefully through the two provided examples. You're being retarded anon.

Replies: >>105585503

Anonymous

6/13/2025, 10:58:49 PM No.105585429

>>105579704 (OP)
>72.5% of ARC-AGI Tasks
Wow, more meaningless drivel nothing benchmarks directly from the people who make this garbage.

Anonymous

6/13/2025, 11:06:12 PM No.105585503

1735840900381857

md5: 7a1290366a171b2d9ce4eda85ced826d🔍

>>105585428
I got it now and feel really dumb for not getting it sooner, it couldn't have been more obvious in the first example

Replies: >>105585541

Anonymous

6/13/2025, 11:10:28 PM No.105585541

>>105585503
Nice anon I knew you could do it

Replies: >>105585556

Anonymous

6/13/2025, 11:11:58 PM No.105585556

1742695079561145

md5: e0507b54d59d7599c411c1606e1d8bf4🔍

>>105585541
A true gentleman leaves no puzzle unsolved!

Anonymous

6/14/2025, 4:20:35 AM No.105587778

you fucks are playing god

Replies: >>105590035 >>105590048

Anonymous

6/14/2025, 10:53:40 AM No.105589796

>>105580340

lol at 17 bucks

they paid some people some money to solve the test once, big deal

like what, 1700 total if there are a 100 tasks

there are billions going into ai research

Anonymous

6/14/2025, 11:46:00 AM No.105590035

>>105587778
SOMEONE'S gotta do it since the old god stopped playing

Replies: >>105593435

Anonymous

6/14/2025, 11:48:12 AM No.105590048

>>105587778
reddit comment of the year

Anonymous

6/14/2025, 2:47:05 PM No.105591060

>>105580189

Of course. Updates has to be consistent with what is already known to be "out there", and this is the fundamental problem unsolvable by mere throwing random shit at the wall.

Math and logic evolved for millennials and can be meaningfully applied only in very strict and narrow contexts

In short, their shit will hallucinate a subtle, hard to spot bullshit, just like proofs by retards.

Anonymous

6/14/2025, 3:47:22 PM No.105591412

>>105579704 (OP)
Can it beat Pokémon red

Anonymous

6/14/2025, 7:58:08 PM No.105593435

>>105590035
can't wait for the datacenter of babel update

Anonymous

6/14/2025, 9:39:05 PM No.105594248

>>105580445
While I don't disagree with your post in general, you have to admit that fundamentals become useless quickly and are only required from a handful of people. Yeah you could know how a processor works to optimize inference, but what's the point if libraries and compilers got that figured out.

Anonymous

6/14/2025, 9:44:34 PM No.105594292

>>105579704 (OP)
>Solves 72.5% of ARC-AGI Tasks
Wow, more made up, ad-hoc benchmarks!
I'm legit spooped!

Anonymous

6/14/2025, 9:59:25 PM No.105594390

>>105580473
chat, is this real?

Anonymous

6/14/2025, 10:07:37 PM No.105594455

>>105580445
this reads like LLM slop

Anonymous

6/15/2025, 12:49:51 AM No.105595893

>>105580451
this place is for pseuds lmao

Anonymous

6/15/2025, 12:50:52 AM No.105595902

>>105580452
you know what also happens with feedback loops?
you should be able to solve this

Anonymous

6/15/2025, 12:53:19 AM No.105595918

>>105580430
What's dangerous about AI is how small the time between an overhyped useless tool and something that totally consumes the observable universe to calculate some useless task can be.