>>106164884 (OP)It’s more overhyped LLM tech. The same thing just repackaged - the impressive thing is the speed at which it can generate frames, but that’s presumably just because it’s got vast compute powering it (it’s a google research project, they literally have the worlds most powerful super computers). Imagine this compute powering a raytraced ultra max settings traditional video game.
The object permanence seems amazing at first, but it’s basically just a context window in a visual sense- it’s like clicking around a google streetview but it generates a new frame 24 times a second. Of course moving forward will make things get bigger, it’s not actually that magical when you see what they’ve done, it’s impressive, and a clever trick but you can see how it works. It basically retains every frame it generates in memory and can retrace its steps if needed. But it has all the same flaws every other LLM has, it’s baked into a single canvas/can’t tweak it/massively inefficient/hallucinations everywhere/huge compute cost/generic slop outputs that never beat human engineering/artistry.
There was a version of Doom they had running on Genie, they trained it on 30,000 hours of Doom footage. They spent a trillion dollars getting Doom to run on a super computer and it looks like shit.