Search Results
6/9/2025, 9:32:48 PM
Did you see this /hdg/? https://github.com/g-luo/dual_process https://dual-process.github.io/ https://arxiv.org/abs/2506.01955 Dual-Process Image Generation https://xcancel.com/graceluo_/status/1931069106356474030#m
You can finally invert the knowledge of any vision (NLP) model, without using it to caption and use it to generate.
More precisely you can use it to train a LoRA that matches arbitrarily complex prompts on top of any image gen model, such as SD or Flux.
I don't think it's implemented for SDXL yet, but it's model agnostic so it shouldn't be too hard to implement.
We can finally have arbitrarily complex and autistic and specific prompts in natural language, even including sample pictures as input and get a LoRA as output! You could have it generalize unique positions and more.
The catch? You need to run both the image gen model and the VLM at the same time, so 48gb+ VRAM chads only, unless you want to take hours on CPU.
Some old pic I had saved months ago.
You can finally invert the knowledge of any vision (NLP) model, without using it to caption and use it to generate.
More precisely you can use it to train a LoRA that matches arbitrarily complex prompts on top of any image gen model, such as SD or Flux.
I don't think it's implemented for SDXL yet, but it's model agnostic so it shouldn't be too hard to implement.
We can finally have arbitrarily complex and autistic and specific prompts in natural language, even including sample pictures as input and get a LoRA as output! You could have it generalize unique positions and more.
The catch? You need to run both the image gen model and the VLM at the same time, so 48gb+ VRAM chads only, unless you want to take hours on CPU.
Some old pic I had saved months ago.
Page 1