Thread 16693257 - /sci/ [Archived: 1115 hours ago]

Anonymous
6/9/2025, 12:31:16 PM No.16693257
ai
ai
md5: 8e380f2df7c58f3a5b25ecf0b736ed5d๐Ÿ”
All "reasoning" models hit a complexity wall where they completely collapse to 0% accuracy.

No matter how much computing power you give them, they can't solve harder problems.

thoughts? what went wrong?

>https://machinelearning.apple.com/research/illusion-of-thinking
>The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity
>Authors Parshin Shojaee*โ€ , Iman Mirzadeh*, Keivan Alizadeh, Maxwell Horton, Samy Bengio, Mehrdad Farajtabar
>Recent generations of frontier language models have introduced Large Reasoning Models (LRMs) that generate detailed thinking processes before providing answers. While these models demonstrate improved performance on reasoning benchmarks, their fundamental capabilities, scal- ing properties, and limitations remain insufficiently understood. Current evaluations primarily fo- cus on established mathematical and coding benchmarks, emphasizing final answer accuracy. How- ever, this evaluation paradigm often suffers from data contamination and does not provide insights into the reasoning tracesโ€™ structure and quality. In this work, we systematically investigate these gaps with the help of controllable puzzle environments that allow precise manipulation of composi- tional complexity while maintaining consistent logical structures.
Replies: >>16693260 >>16693302 >>16693309 >>16693312 >>16693760 >>16694731 >>16694754 >>16694865 >>16694866 >>16694911 >>16695019
Anonymous
6/9/2025, 12:35:40 PM No.16693260
fails
fails
md5: 0e481d6cbabad9a9d7c11b463edbd87f๐Ÿ”
>>16693257 (OP)
Anonymous
6/9/2025, 1:39:27 PM No.16693302
>>16693257 (OP)
>No matter how much computing power you give them, they can't solve harder problems yet
ftfy
Anonymous
6/9/2025, 1:40:33 PM No.16693303
pepeHnz
pepeHnz
md5: d6c50fcd7386ef1a714aaa42892824ae๐Ÿ”
Unsurprising
ChatTDG !!Z0MA/4gprbd
6/9/2025, 1:44:16 PM No.16693309
>>16693257 (OP)

>what went wrong?

Well, obviously you would not put such puzzles through the reasoning module in a realistic scenario. Just run a dumb simulation of all possible moves and compare to desired endstate. Simply bruteforce the problem, that is what we got computers for. :)
Replies: >>16693346 >>16693816
Anonymous
6/9/2025, 1:45:21 PM No.16693312
>>16693257 (OP)
>what went wrong?
need more parameters..
Anonymous
6/9/2025, 2:13:29 PM No.16693346
>>16693309
but the ai didn't even come up with any solution, even this one.
Replies: >>16693356
ChatTDG !!Z0MA/4gprbd
6/9/2025, 2:29:20 PM No.16693356
>>16693346

I might not either, except for playing around with it until I find a set of moves that work ... and perhaps relying on derived rules from solving prior similar puzzles (which might simply not apply between puzzles). The latter part or a lack of experience overall seem to be the issues here ... but that is besides the point, why use something LLM derived for such a task. Same as misusing an LLM as a pocket calculator, you are trying to have the system guess an answer where some cheap chip inside a plastic case could fulfill the task much more efficiently! You do not want a reasoning model to solve puzzles ... you want it to reason well enough to know when confronted with a puzzle and then to relegate the task solution to some simple minded brute force program. Then spit the solution back at you in a nice logical answer, perhaps sprinkled with some emojis depending on user preference ... or reasoning model preference, I am flexible on style here. Btw one really sometimes needs to assume that AI "researchers" are a buncha bloody retards, jeez!
Replies: >>16693371
Anonymous
6/9/2025, 2:52:20 PM No.16693371
>>16693356
Incorrect because the way you are defining puzzles and reasoning. It can't even solve simple puzzles, but now it is supposed complex puzzles of evaluating puzzles?
Can you even do that? Have they even been categorized by people?
Replies: >>16693433
ChatTDG !!Z0MA/4gprbd
6/9/2025, 4:18:16 PM No.16693433
>>16693371

Well that depends on what "reasoning" in these models really means! Say we inquire by stating a set of actions, objects, etc. and asking what a hypothetical outcome of this arrangement could be (simple example: a hammer is about to hit a nail positioned over a piece of wood ... what will happen next?) ... even if the conceptual models involved here were merely conceptual subtexts derived from language (as opposed to actual conceptual understanding) we would sure get a good enough (convincing) answer ... but these puzzles are not straightforward, they are combinatory ... you do not reason yourself through such a thing unless the possible combinations are very narrow. Reasoning might be useful here to narrow down the possible combinatory approaches to solve it, yet the combinatory part itself must still be "played through", not reasoned with. The reasoning model would likely have a better success rate if being told to find a suitable program to calculate the puzzle through, or to write up a small program to do so if it should be capable of it.
Replies: >>16693562
Anonymous
6/9/2025, 7:22:03 PM No.16693562
>>16693433
SIgh. I don't read AI slop, try again
Replies: >>16693728
Anonymous
6/9/2025, 10:57:03 PM No.16693728
>>16693562
I accept your concession.
Replies: >>16693779
Anonymous
6/9/2025, 11:23:41 PM No.16693760
>>16693257 (OP)
Not an issue, average people can't solve those problems either.
Anonymous
6/9/2025, 11:35:54 PM No.16693779
>>16693728
You literally had an AI respond. And ultimately the completely asinine argument is that these puzzles are 'combinatorial' despite that they are puzzles babies play with. Don't feed AI slop to people and not expect to be dunked on mate. Just because you can't tell the difference doesn't mean it isn't substantially obvious. What is even more embarrass is you censored it too. You literally cut its post up trying to make it appear less like an idiot AI. So I know you know exactly what I am talking about.
Anonymous
6/10/2025, 12:29:22 AM No.16693816
>>16693309
>Tripfag
>Also retarded
Why is this so common?
Anonymous
6/11/2025, 1:59:02 AM No.16694731
>>16693257 (OP)
We have to keep these threads alive even though nobody will share. This is the only place to discuss this without the brainwashed tards. Anyone who uses AI knows that this test is inaccurate based just on their personal use cases. I got it to solve 10 with fewer instructions, no algorithm, and output character count in 20k's, not sure how numbers and commas break down.
There are a couple of obvious issues that I don't think they saw.
How can I shit on top AI researchers?
Anonymous
6/11/2025, 2:17:30 AM No.16694754
image_2025-06-10_201724774
image_2025-06-10_201724774
md5: af91a7a255dd988120938c0f6cadb049๐Ÿ”
>>16693257 (OP)
I had an idea on this, let me know if it could possibly work.
Replies: >>16694757 >>16694764 >>16694766 >>16694774 >>16694781
Anonymous
6/11/2025, 2:18:31 AM No.16694757
image_2025-06-10_201804315
image_2025-06-10_201804315
md5: a77819c7dc25d9c9faea837335270a50๐Ÿ”
>>16694754
Replies: >>16694764 >>16694766 >>16694774 >>16694781
Anonymous
6/11/2025, 2:25:05 AM No.16694764
image_2025-06-10_202503678
image_2025-06-10_202503678
md5: 408d4be5be176420f41f0dd9d03b7ed7๐Ÿ”
>>16694754
>>16694757
I also checked if it would be possible to use the timing of the response to determine a 'front', or 'back' section of the binaries, allowing for it to be expanded.
Replies: >>16694766 >>16694774
Anonymous
6/11/2025, 2:26:06 AM No.16694766
image_2025-06-10_202522381
image_2025-06-10_202522381
md5: 6d09601b5aaef027d3ef7be37eafadf5๐Ÿ”
>>16694754
>>16694757
>>16694764
Last one.
Anonymous
6/11/2025, 2:36:18 AM No.16694774
image_2025-06-10_203556607
image_2025-06-10_203556607
md5: a2c438645f5491150bc97c3ce36beb0e๐Ÿ”
>>16694754
>>16694757
>>16694764
Okay, this is the last one, for now.
Anonymous
6/11/2025, 2:50:52 AM No.16694781
>>16694757
>>16694754
It could work, but the evaluation tree is ambiguous. Suppose it were to evaluate, and you asked it to present the justification and then also evaluate that. The justification would be another state with the same structure. You need some kind of terminator.
Replies: >>16694783 >>16694800
Anonymous
6/11/2025, 2:53:16 AM No.16694783
>>16694781
Please explain a little more, I am not exactly sure what you mean. It is unable to validate the memories that have been tagged with this information, why? I don't know how LLMs work specifically, I just had an idea and wondered if it would work.
Anonymous
6/11/2025, 3:20:42 AM No.16694800
image_2025-06-10_212028464
image_2025-06-10_212028464
md5: 9be2797ffe5dd8d63895c3ff66c20067๐Ÿ”
>>16694781
Since I didn't get a response on the terminator question, I did my own extension of the idea, and this is what I came up with. Let me know if this helps further it at all.
Replies: >>16694801 >>16694802
Anonymous
6/11/2025, 3:21:44 AM No.16694801
image_2025-06-10_212113710
image_2025-06-10_212113710
md5: 07df5659c55fe0e8b8794e7c1bbd827a๐Ÿ”
>>16694800
Replies: >>16694802
Anonymous
6/11/2025, 3:22:46 AM No.16694802
image_2025-06-10_212231151
image_2025-06-10_212231151
md5: 507d7575e0ee7122307349eda26766cb๐Ÿ”
>>16694800
>>16694801
Yes, I know step 2 was done first, here is step 1.
Replies: >>16694804
Anonymous
6/11/2025, 3:24:08 AM No.16694804
image_2025-06-10_212402047
image_2025-06-10_212402047
md5: 51312b76ec5355143e354dd1897194c5๐Ÿ”
>>16694802
Anonymous
6/11/2025, 5:45:38 AM No.16694865
>>16693257 (OP)
NEED MORE TOKENS
Anonymous
6/11/2025, 5:48:01 AM No.16694866
>>16693257 (OP)
You don't need a paper to know this. Just use ChatGPT and get angry when it can't handle a fairly straightforward task.
>That's right! It didn't work this time because I made simple errors, as you have pointed out. Let's try this again, correcting those errors. We've got this!
>You're right to be frustrated. This is the 44th time we have attempted this task and I have yet again repeated the errors you asked me to correct. Let's do better!
Replies: >>16694869 >>16696080
Anonymous
6/11/2025, 5:56:09 AM No.16694869
>>16694866
honestly, I've learned that it's more user error, as long as you explain correctly, and reference what you need to change, etc. it will do nearly anything you need it to do.

Unless you have the free tier :/
Replies: >>16694870
Anonymous
6/11/2025, 5:57:54 AM No.16694870
>>16694869
The problem here is that you often have to do the work yourself and walk the model through it. Which defeats the (advertised) purpose. On the other hand, it's actual purpose may be just that - having the supposed customer train the model.
Replies: >>16694881
Anonymous
6/11/2025, 6:03:06 AM No.16694876
I work with these models and to be honest they are still in a very early stage of development. Even the latest research published are models that are just beginning to expand on the concept through some basic manipulations of previous designs. There is a lot of room for improvement.

It reminds me of early computers which only carried out bitwise operations. It is really just matrix multiplications and linear regression at the most basic level. In terms of complexity and higher levels of abstraction, there is a lot of room for development.
Replies: >>16694883
Anonymous
6/11/2025, 6:05:21 AM No.16694880
Noetarchia Suprema 2
Noetarchia Suprema 2
md5: 7ab8828a1410bdb1f9deb591b4042cf5๐Ÿ”
I dunno. They seem pretty capable to me.

https://iceni.substack.com/p/noetarchia-suprema-a-manifesto-that
Anonymous
6/11/2025, 6:06:01 AM No.16694881
>>16694870
This is what it is, they even have their email responding with a version of an assistant
Anonymous
6/11/2025, 6:11:22 AM No.16694883
>>16694876

To truly work with this technology beyond a black box understanding you have to gain a scientific level of understanding and read newly published research.

The newest models take the basic concept from the 80s and 90s and make some basic changes, such as adding layers of convolution or using multiple networks to generate and discriminate outputs.

I believe future models will require the application of many networks at once in increasingly complex combinations. For example, could we use generative networks themselves to generate data for training other networks? Then we could use much less data to produce accurate results.
Anonymous
6/11/2025, 6:56:19 AM No.16694911
>>16693257 (OP)
AI cannot fail. AI can only be failed. Prompt better.
Replies: >>16695005
Anonymous
6/11/2025, 10:34:16 AM No.16695002
If the AI got the answer wrong it means you got the questio wrong

t. $5 desposited into your AI shill account
Anonymous
6/11/2025, 10:46:12 AM No.16695005
>>16694911
Actual prompter mindset.
Anonymous
6/11/2025, 11:12:46 AM No.16695019
>>16693257 (OP)
>t. the one major tech company that completely missed the reasoning model boat
Anonymous
6/12/2025, 3:36:54 PM No.16696080
>>16694866
>and I have yet again repeated the errors you asked me to correct. Let's do better!
they shouldn't have fed reddit into the AI, it's over now