>>106925597
We also shouldn't forget that checkpoint training has some really weak links in tag capture models. Take WD Tagger, for example, to properly capture an entire image, you need multiple crops: the general image, a closeup of the character's face for expression tags, another of the hands if they're holding something or performing an action, and then one of the background. In other words, properly tagging a single image requires multiple passes, a process that gets skipped during bulk tagging. And that's when I suspect part of the hands and feet problem comes from this weak link in the chain of peripheral models.