Anonymous
11/2/2025, 3:05:58 PM
No.107081433
For image models to really improve, they need to understand layering and anatomy and perspective and such, right? Even if we limited ourselves to current training methods, that seems doable if the dataset existed. Like, imagine if instead of finished images, we also conveniently had thousands and thousands of .psd with layers intact to train off of? Or imagine using the exact same models we have now, but they are integrated with controlnets: you press gen, but under the hood what the program does is get the messy base gen, segment, gen models, place them in a 3D environment, warp and stretch, generate a controlnet+comprehensive mask map from camera position instead of image estimation, then regen using the controlnets and masked regions. The inputs and outputs of each moving piece of the workflow are not all that different from what we are currently doing, we just don't have the datasets. Imagine if we had scans of every cel in the Disney vault, every blender project file, etc. We just don't have it the way we have the finished product, in the same way an LLM trained on documents with revision history (with special tokens for backspace etc) would behave differently than LLMs based only on finished products despite both being LLMs.
Not a problem with the training techniques so much as the data and lack of will.
Not a problem with the training techniques so much as the data and lack of will.