AI research general - /g/ (#105982280) [Archived: 74 hours ago]

Anonymous

7/22/2025, 12:19:16 AM No.105982280

1753063397521683

md5: 6f092c5d728adc41d6eb292edf02c353🔍

This is a thread dedicated to trying out novel ideas and optimizing the performance of existing AI models.

Replies: >>105982355 >>105983267 >>105983468 >>105987417

Anonymous

7/22/2025, 12:27:08 AM No.105982355

1752864430887272

md5: ed58f1386e5516a03e8a91fa34c42343🔍

>>105982280 (OP)
giving arisu headpats and cuddles

Anonymous

7/22/2025, 12:32:31 AM No.105982410

Personally I've been interested in the following question.
How can we sample a causal model in a way that's not strictly left to right? For example, consider the following prompt. The models can easily solve the question when we ask them to write it left to right. But they absolutely can not when asked to do it right to left, for obvious reasons.
Consider the following integer sequence:
p1 p2 p1*p2 p3 p1*p2*p3 p4 p1*p2*p3*p4 and so on, where p... are prime numbers.
Output the first 12 numbers of that sequence in the same line, in reverse.
Don't output anything else.

Theoretically, exhaustive search would find the right answer, meaning it would be the answer with the lowest average loss across all tokens. But is there a more efficient way to arriving at that answer?
Causal models (which all big and popular models are) are only trained to attend to tokens to the left, not on the right, unlike the less powerful bidirectional masked token completion models like BERT and T5.

Replies: >>105985520

Anonymous

7/22/2025, 2:16:26 AM No.105983267

file

md5: 92d93bfd70f744a964592eb4762fa854🔍

>>105982280 (OP)
If the OP image had been a maid, this thread might have 50 posts in it about counting to neural networks. Every day I miss the maid threads more and more.

Replies: >>105983695

Anonymous

7/22/2025, 2:51:27 AM No.105983468

>>105982280 (OP)
Nice fishing attempt Sam. We all know you're reaching the limits of LLMs. Nobody is giving you shit.

Replies: >>105985649

Anonymous

7/22/2025, 3:26:29 AM No.105983695

>>105983267
50 posts from your stupid dick pic harvesting proxy service?

Replies: >>105986751

Anonymous

7/22/2025, 5:25:57 AM No.105984343

I'm don't have technology expertise so forgive my ignorance. My impression is that there are two AI improvement trends excluding symbolic methods. The first is the decadeslong chain of improvement to how connectionist AI operates going back to the 1958 perceptron, which includes the recent development of the transformer model in 2017. The second is the amassing of data, computational power and fine-tuning tweaks technology corporations are using to squeeze as much profit from the transformer model as possible. So if LLMs grind up against limits in the coming years, how difficult will it really be to supersede the transformer model (or the equivalent of superseding)? Is there reason to believe that that will be extremely difficult, and upgrading LLMs will turn into an extended slog, or is there any evidence that some team of geniuses in a company or government could have a eureka moment with an "attention hyper-permutator" or some shit? Might seem like a stupid question because I'm basically asking how futile the goal of this thread is.

Replies: >>105984580

Anonymous

7/22/2025, 5:34:55 AM No.105984411

I sort of get how you can bump up certain logits, but how can you bump up certain concepts at the feature level? How did Anthropic make everything Golden Gate Bridge? doesn't a concept like that span multiple layers? no i dont want to read their blogpost about it

Replies: >>105984580

Anonymous

7/22/2025, 6:08:04 AM No.105984580

>>105984411
Feed it a bunch of GGB related prompts. Measure which neurons light up the most. Feed it an unrelated prompt and manually bump up those neuron's outputs. Ez.

>>105984343
Data curation is not really AI.
The two strands of AI in my book are these:
First you have the classical ML stuff where you try to find the optimal architecture to improve your validation loss as much as possible.
And then we have stuff that doesn't fit neatly into training and validation sets, but nonetheless is necessary to make AI smarter.
The basic question is whether the way forward is by some end to end mathematically elegant generic mechanism, or the way forward is to tack on ugly hacks like CoT and tool usage and hope that gets us to the point where AI is capable enough to improve by itself faster than we can improve it. And whether there is a mathematical way to explain, formalize and generalize such hacks.
Personally I think we won't figure out the theory ourselves and AI will surpass human capabilities mainly by tacking on a few more hacks and maybe one or two architecture generations beyond the transformer.
It already is smarter than the average human, and it's at the brink of being smart enough to improve itself without humans in the loop, mainly just by scaling compute and a few hacks like verifiable rewards.

Replies: >>105984987

Anonymous

7/22/2025, 7:57:44 AM No.105984987

>>105984580
Again, forgive my ignorance but I find the intelligence comparison between humans and AI vague and lacking in substance. The average human can't communicate articulately, let alone convey little-known details about diverse subjects like chatbots can but that comparison is only based on the final outputs they produce. Doesn't the information processing leading to the result have a degree of quality that could also be compared? The average idiot at least has a persistent life where he reacts to his environment in real time. He has enough self-awareness to describe what he was thinking in the process of solving a math problem for example, but the chatbot just runs from beginning to end in an instant and can't even report its own "thoughts". What's the utility of comparing the intelligence of a computer program with an organism anyway when they're ultimately just separate entities that behave in contrary ways? There's utility in talking about the functional details about them. The intelligence explosion you refer to based on autonomous self-improvement should be enabled by a specific piece of software that could be applied or not applied, right? Maybe I'm missing some important factor here but I can't help but think powerful people wouldn't be interested in risking their own obsolescence by allowing that software to be used. At the very least, we share an instinct for self-preservation and risk-aversion with elites, right?

Replies: >>105985167

Anonymous

7/22/2025, 8:36:14 AM No.105985167

>>105984987
People built enough nuclear bombs to destroy the Earth many times just over a vague sense of ideological and geopolitical dominance.
Do you seriously think they wont build a machine that brings them ideological and geopolitical dominance AND prints mountains of money every day AND it's much easier to cope about it being a positive for humanity compared to nukes, which were explicitly built to incinerate millions of innocent people?
As for what you said about a real time loop, yes, that's exactly what I mean. We have fairly limited theory and experience to draw from when it comes to real time machine learning with different tiers of computing like the human nervous system has (spinal cord reflexes, cerebellum, brainstorm, cortex, etc.) and online (real time) learning/training.
That is where backprop fails, because it works in terms of inputs and outputs, not in terms of inputs and outputs overlapping over time unlike real neurons.
At most you can have hacks like draft models, but there's no framework to have one part of the model control real time inputs and outputs while another model determines the strategy in a way that is learned end to end. At least as far as I know but I'm no robotics expert.

Replies: >>105985362

Anonymous

7/22/2025, 9:45:29 AM No.105985362

>>105985167
As far as the possibility of regulatory restrictions and delays, I think the history of existential risk management is somewhere in between. We could have attempted limited nuclear wars or spread the bombs around more recklesly but we chose not to. Likewise, if an "AGI" was being developed that had cartoonish destructive potential to fry critical infrastructure or infect a billion computers, I think the people in charge wouldn't make no effort at all to prevent a catastrophic outcome just like they wouldn't panic and shut it all down.

As far as the engineering issues, I suppose you're confirming that there are plenty of ideas to replicate or surpass some of the functionality that human brains have, to synchronize many cognitive modules and have more persistent processing. Of course it's an exciting idea for the future to come true by building intelligent machines but if the goal is a practical one to automate then why do we need general-purpose intelligence? Do companies really need to come out with one generation after another of text/image+ autocompletes in order to follow that up with application-specific systems? Isn't it safer to iteratively upgrade a bunch of specialized applications or is that just unrealistic for some fucked up reason? I'd like to imagine an intelligence explosion would be delayed in favor of specialization but maybe that's wishful.

Replies: >>105985477

Anonymous

7/22/2025, 9:51:17 AM No.105985387

Einstein

md5: c7c8ac059913096c15f4a291b2484ac3🔍

I can't get Deepseek to draw ASCII portraits. Like at all.

It either draws abstract doodles that barely resemble people, generates endlessly, or both.

Replies: >>105985649

Anonymous

7/22/2025, 9:53:02 AM No.105985396

>optimizing the performance of existing AI models
you can't do that in any meaningful way, no, tweaking your prompt is not "optimizing model performance"

Replies: >>105985520

Anonymous

7/22/2025, 10:11:02 AM No.105985477

>>105985362
I think normalizing nuclear war might have been safer from an existential point of view.
If the US and the USSR nuked each other early after WWII when there weren't that many to go around and survived, both sides would've experienced how awful it was and it might have lead to stronger proliferation agreements.
I think the idea of ensuring "peace" by threatening to destroy the world if nukes are used is far more dangerous, both because the people in charge don't really have an idea of how awful it would be, and because we rely on the control systems not failing to avoid the apocalypse.
I think nuclear wise we got as close as we could to destroying the world without actually doing it. The only way it could've been worse is if the nukes were connected to the internet and hackable, or a single operator could launch a bunch of ICBMs by himself (although this is technically true for the president). I know less about Russian systems.
As for automation, the jobs that could be automated with specialized models for the most part already are, most of the jobs left to automate require novel thinking and on the fly problem solving of problems either never seen before or that happen too infrequently to be worth training a specialized model for, economically.
That's why AGI will always be much more valuable than ASI given the same inference cost.

Anonymous

7/22/2025, 10:19:24 AM No.105985520

>>105985396
In a way, model selection is the first level of optimization to aim for when trying to get higher performance on a certain task. This is why it's beneficial to develop custom benchmarks because public benchmarks are likely overfitted for by many of the models.
As for prompt optimization, I suppose it technically qualifies as on topic for this thread, but it would be far more interesting if we're talking about automatically optimizing the prompt for a certain task, rather than manually tweaking it. But still if we are testing the performance quantitatively it would be somewhat interesting to see how different prompts affect the quality of the output.
When I wrote that I was mostly thinking about sampling methods like I explained here >>105982410

Anonymous

7/22/2025, 10:45:50 AM No.105985649

>>105985387
The reasoning model or the non reasoning models?
CoT is a hack and I don't really like it too much. Gradients don't propagate through text so the only way it gets better is through random mutations in the reinforcement learning process.
And if it's the non reasoning model then I also don't think most of us would make a great image either without the ability to iteratively improve.
One interesting thing to try would be whether an actually good image results in a higher or lower average loss. If it results in a lower loss then the model has the knowledge and the problem is from the way we do inference (causal autoregressive).
If it results in a higher loss then that's a deeper problem.
This is why I think a more GAN like or denoising process probably works better than strict left to right generation. But it also is more inefficient compute wise.
One thing I would like to see more of is model repair and convergence to the most generalizable model rather than the early stopping hack we use now. Because it would make it possible to iterate over old models rather than make a new model from scratch every time which is unsustainable long term.
Yes we all like nest loss charts but IMO that's not where the future is headed.

>>105983468
If I was Altman I would be talking with my engineers about my ideas rather than a bunch of 4chan random.

Replies: >>105990113

Anonymous

7/22/2025, 1:57:55 PM No.105986751

Sakuna

md5: 63cab36aaa270a201817615653f262d5🔍

>>105983695
Maids were never affiliated with the fat disgusting penis poster. This was a lie spread by trolls to try to trick jannies into taking action against maidposts. If a maid proxy did exist, and it did have some kind of similar requirement, the requirement would be to wear a maid outfit, not to post your genitals. NSFW posting in general would be blanket banned because allowing it isn't worth the moderation effort and the people it attracts tend to be useless and only there for the NSFW content. A maidposting proxy/site/whatever would be for advanced Mathematics and Computer Science research. The internet is already flooded with incomprehensible amounts of pornography. It doesn't need another penis posting repository, it needs maid research.

The whole discussion is kind of moot though, since this is a Science Foundation for maids and no proxy is needed to post maids to it.

Replies: >>105990306

Anonymous

7/22/2025, 3:15:19 PM No.105987417

>>105982280 (OP)
Create a logic model that cannot possibly be just overfitting training data. And yes, all current language models are 90% just overfitting training data since they cannot rediscover simple laws of physics when given the information that scientists from the past had.

Where you would train LLM on synthetic data made by some already smart LLM to only produce logical questions and puzzles, but the twist is that all of the puzzles will have no subjects in them, it will be just some random set of symbols generated for each question even as they repeat in the training set, and part of the question would always be set of definitions and characteristics of the different randomly generated subjects with plenty of redharings and even questions with no answer where the answer should be "I dont know". Start with easy questions and then escalate into harder and harder ones, punishing failure on easier questions harder then on more difficult ones so that it wont just forget how to solve easy puzzles.

And by the end you will have small model that can supposedly logically reason without remembering anything, or even knowing that dogs have 4 legs, just blank slate that can think rationally and can stop and tell you it is wrong when it does not know something. And then you use this thing to teach it on the swaths of internet while trying to protect it from its neurons being overwritten. Sort of creating layered models where the core of the model is the logic model, then long term memory model trained from scraped internet like all current LLMs are, then mid term memory for fine tuning the model for specific task, and then finally short term memory which is blank and is being trained while the model is deployed instead of faking having memory by just increasing the context windows which always either results in stupid amount of power being wasted, or the model forgetting what happened in the middle of the text.

Anonymous

7/22/2025, 8:05:14 PM No.105990113

>>105985649
>The reasoning model or the non reasoning models?
Both

Anonymous

7/22/2025, 8:22:00 PM No.105990306

>>105986751
the girl has very big boobies

Anonymous

7/22/2025, 8:42:31 PM No.105990530

What do you think is the best approach to prompt a model to keep a story going?

It may seem trivial, but there are many approaches for that.

You can ask it to write a continuation of the story as a writer.

You can ask it t roleplay as a gamemaster.

You can ask it to simulate what happens

Anything else?

Goal is maximizing creativity but also consistency as in keeping it "realistic" in a physical sense.