>>103934885
Is because there is no such thing as "Letter" in the training.
Just like there is no such thing as "Hand".
There's trillions of combinations of both, however, and they are all wildly incompatible for the most part.
The training system doesn't understand that your fingers don't normally bend backwards, or that they can't have 4, 6 or even 3 fingers.
They shit fit weights together that feel right.
The issue is there's a lot of art out there, a lot of art is actually shit, and a lot of it clashes, averaging out to a fucking mess.
Think of it another way - if you were to merge every single face together in an average, you'd get something that looks like a face. You've likely seen these before.
If you merged every hand together, it would be a mess. That's the issue at hand, quite literally.
The same issue happens with limbs where the right conditions can lead to limb duplication. With hands there's too many configurations that can lead to these glitches in generation.
Same goes for letters. No such thing as a letter - you average every letter and you get something close to a solid rectangle.
The complexity of doing text is several orders of magntiude WORSE than doing hands, which is why it is so difficult to do right.
You need to let the model run for a long ass time to generate anything decent, which quickly becomes expensive.
The only decent solution is going through training data and giving some extra tagging information to problematic information, giving the model OCR-tagged letters, etc.
But that's also expensive as fuck to do.