We aren’t anywhere close to AGI yet. At best what we have now from OpenAI and Chinese labs is a bunch of intelligent emulated toddler minds who are great at some things like coding but terrible at planning, thinking, and self evaluation. And the agentic and multi agent approach is attempting to thrown multiple toddlers into a playground and hoping they get something done together if you setup a 20,000 line long system prompt. Hahahaha, cmon really? We are going back to the days of simple conditional if else statements with that. So anthropic CEO is full of shit because there is no transformers based model out today or anything similar architecture that can replace a software or machine learning engineer. Every single instance needs to have a human engineer not to hold the ais hand because that’s a waste of time but to accelerate the machine learning engineer coding.
You got models that can write code but even 3.7 or 4.0 anthropix will eventually plateau out in terms of their ability to incrementally improve with each new user prompt if you keep the prompt chain going long enough. I’ve seen models create code that functions, runs without problem but not actually carry out the task it was meant to do.
Basically in terms of multi shots, the higher the number of prompts in an instance, the greater the probability of the model output collapsing and falling flat in terms of relevancy and accuracy.
This was evidence recently but a paper that found that with tougher questions reasoning models completely gave up and provided gibberish answers. And what amidst of the development work being done now is squeezing the last 1% of performance from the current paradigm of ai models, so transformers architecture diffusion.
It’s not going to get any better. We aren’t going to hit AGI or capable AI agents that can easily replace software engineers developers or any reasoning/thinking job, not even lawyers.
You got models that can write code but even 3.7 or 4.0 anthropix will eventually plateau out in terms of their ability to incrementally improve with each new user prompt if you keep the prompt chain going long enough. I’ve seen models create code that functions, runs without problem but not actually carry out the task it was meant to do.
Basically in terms of multi shots, the higher the number of prompts in an instance, the greater the probability of the model output collapsing and falling flat in terms of relevancy and accuracy.
This was evidence recently but a paper that found that with tougher questions reasoning models completely gave up and provided gibberish answers. And what amidst of the development work being done now is squeezing the last 1% of performance from the current paradigm of ai models, so transformers architecture diffusion.
It’s not going to get any better. We aren’t going to hit AGI or capable AI agents that can easily replace software engineers developers or any reasoning/thinking job, not even lawyers.