has anyone figured out a better way to caption 2girl training images than [girl that the base model recognizes] (matching outfit), [second girl the model recognizes] (matching outfit)? this has pretty ok consistency but not being able to mix or match anything sucks