Anonymous
11/7/2025, 6:44:58 PM
No.107134422
>>107133848
When inpainting, the model only sees whatever slice of the image you're passing it (your mask plus some context padding around it). It doesn't see anything else in your image, but it will still follow your tags regardless.
So if your mask is just the hand but you have a prompt of 'big tiddy fox girl', the model thinks "The tags say I have to draw an entire woman here, but all I see is some weird fleshy blob, what the fuck is this. The tags say long_hair so I have to put hair somewhere. The tags say yellow_eyes so I have to put eyes somewhere. I'm going to transform this into an entire woman because the tags say 1girl and I don't see the 1girl anywhere."
If you increase your padding to also include the actual character, but keep the mask at only the hand, then it thinks "Ah, there already is a 1girl here. The tags say long_hair and yellow_eyes, but those already appear in the picture so I don't need to add them. The mask seems to be around a hand, I can tell because the arm connects to it and I can see its positioning with respect to the rest of the image, so I can focus on that."
The issue is that the bigger the padding, the less brainpower the model uses on the specific masked area, so you don't want to go too far. And padding is generated as a box in all directions, but sometimes you care more about some direction than other. Like maybe left of the hand is the character and you want the model to see that, but right of the hand is a fucking tree or something that would offer no information.
A trick here is to use two separate masks: the actual mask for the hand, and then 'mask' that is just a small pixel dot somewhere in the direction you want to expand. So you mask the hand, and then add a pixel left of the head, so that the context expands left towards the character but not right towards the tree.
Alternatively, keep denoise low, which limits how much freedom the model has to change stuff.
When inpainting, the model only sees whatever slice of the image you're passing it (your mask plus some context padding around it). It doesn't see anything else in your image, but it will still follow your tags regardless.
So if your mask is just the hand but you have a prompt of 'big tiddy fox girl', the model thinks "The tags say I have to draw an entire woman here, but all I see is some weird fleshy blob, what the fuck is this. The tags say long_hair so I have to put hair somewhere. The tags say yellow_eyes so I have to put eyes somewhere. I'm going to transform this into an entire woman because the tags say 1girl and I don't see the 1girl anywhere."
If you increase your padding to also include the actual character, but keep the mask at only the hand, then it thinks "Ah, there already is a 1girl here. The tags say long_hair and yellow_eyes, but those already appear in the picture so I don't need to add them. The mask seems to be around a hand, I can tell because the arm connects to it and I can see its positioning with respect to the rest of the image, so I can focus on that."
The issue is that the bigger the padding, the less brainpower the model uses on the specific masked area, so you don't want to go too far. And padding is generated as a box in all directions, but sometimes you care more about some direction than other. Like maybe left of the hand is the character and you want the model to see that, but right of the hand is a fucking tree or something that would offer no information.
A trick here is to use two separate masks: the actual mask for the hand, and then 'mask' that is just a small pixel dot somewhere in the direction you want to expand. So you mask the hand, and then add a pixel left of the head, so that the context expands left towards the character but not right towards the tree.
Alternatively, keep denoise low, which limits how much freedom the model has to change stuff.