https://civitai.com/models/1782437
This project is surprisingly not dead. New rouwei gemma version has released. Alongside a new t5 gemma 2b.
Still, neither work practically well enough to be useful for much. Both are essentially bunch of pre-alpha experiments.
But the fact that one dude can train text encoders that can output coherent-ish (although not really following the prompt) images with 3 5090s is nothing to scoff at imo.
I don't think either of these models have too much potential. (Maybe the t5 has more potentially, being 2 times larger and t5 models are naturally geared towards this task. Though at this moment it is less trained than the other gemma so similarly useless for now.) Hopefully the author eventually switches to qwen-vl or similar as they claim so that we can get something useful one day.