>>941960410
i don't think it necessarily intuits what fits the scene (not very well anyway) except in the case that objects are often in the scene (like having seashells in the sand on a beach without prompting "seashell").
think about how you aren't really giving it any positional information most of the time (SDXL basically assumes the subject center), so what you're really doing is giving it "concept soup" and letting it decide how all of those things relate. but, if you as the user are better able to inform it about those relations, or use concepts with natural relationships it will probably find, your results improve.