>>105883614
I'm no expert but:
>a tag is just a token expressed without padding language
>padding words are also translated into tokens
>I have no idea what the captions in the Wan training data look like, but presumably there are captions
>prompt bleed still exists

The tags would work for terms which are obviously in the caption set, like "still" or "blocking shot" or "zoom". I also think there are tags like "criterion collection" or "perfect lighting" or "amazing cinematography". I think a sentence or two would still be the best way to describe the actual action, but you could use tags in a similar way to 1.5 where you can fine-tune the output with enough quality tag analogs. That's what I'm doing in my video gens, anyway, and it's working well