>>105655232
the attention mechanism is very like the pattern matching of our real time signal processing in our brain. Just like LLMs we actually do have constant hallucinations -- we are almost blind to detail in fact outside of the center of our vision for example, with the brain reconstructing everything else from short term memory in a lossy fashion, hence why the phenomenon of optical illusions exist. Similar phenomena in hearing etc.
Diffusion models and LLMs are simulacrums of a tiny unimportant part of our brains. You could do away with most of it and still be a conscious being (see for eg Hellen Keller) so all the people who expect models to improve after achieving some benchmark of embodiment or multi modality are not getting it. This technology simply does not have the right hardware to even reach the level of autonomy of an insect.