>>8703020
>so the model doesn't really know to gen anything unless you describe every single object present in the scene
like i said my point is that it's not just dataset distribution, it's about tag density and really high quality of danbooru dataset. it's just that it was trained that way.
say, good danbooru posts are about 40 tags. this is not necessarily true, but let's go along with that number. even with 40% dropout, every time the model is trained, it gets about 24 tags.
how long are your typical prompts? say, 15 tags. this is way too short, the model gets confused and doesn't know what to do because it was always trained on roughly 24 tags at all times, so it just doesn't draw anything that's not mentioned directly, to avoid possible mistakes. by the way, do you know which images would get less tags during training? those which have less tags attached to them (say, 20). which images no one wants to tag? right, the shitty or extremely plain ones. so by prompting shortly you are prompting for the shitty/extremely simple part of the dataset.
hidden tag implications present in the dataset sometimes don't help either. simplest example of that (except it's not hidden) i know of is character tags because they are never dropped during training. for example when you prompt for "very long hair, twintails, aqua hair, aqua eyes, number tattoo, hair ornament", but not the "hatsune miku", you'll basically never get hatsune miku, despite these tags usually go along with "hatsune miku" on like 50% of all "hatsune miku" images, and by this combination you should not get anything but miku. however, you are getting a chinese ripoff.