>>717380223 (OP)>Interactive simulationsGuys, its literally just text2video but camera movement and maybe some player character inserted in the middle. This will not teach any AI robot anything, and it wont replace gaming.
AIs are notoriously bad even to this day at following very rigid rules. They cant play DnD with you because they start making up rules halfway though the game. Do you expect AI generated videos to follow rigid animation and gameplay rules that make the game fair? Precise FPS movement and controls, enemies that behave in predictable ways, levels that are balanced, and iframes that are fair? It will either end up making generic slop if you turn the weirdness parameter down, or be completely unfair hallucinated bs with higher weird parameter.
Second is that the context of this video and the remembering part that separates it from the previous models which just erase anything behind you is that this thing remembers everything that came before it and generates accordingly. This is why it can do only few second clips, because it has to remember the whole thing before it generates next frame. And we have still not solved this problem even in LLMs. Some LLMs are said to have massive context windows like million of tokens, but from experience and few scientific papers you realize that unless you want the LLM to run million times slower you will end up with only the words at the start and end of the context window mattering. We can see this in current LLMs and this will probably still be problem even for these models in the near future.
Thirdly and most importantly this shit will not run on your local machine. Current video models that can be run locally take much more time to generate the video then live feed can offer you, or being significantly lower quality. Add in more then 10 minutes of context needed to not make everything behind the player disappear and you end up with something average person will never afford.
However it could be fun gimmick