I put some of my tools online for you fags:
https://github.com/quarterturn/ollama-video-captioner
https://github.com/quarterturn/ollama-captioner
I use them to make wan 2.2 and qwen-image LoRAs
https://huggingface.co/quarterturn/qwen-image-20b-ruri-rocks

I can crank out a reasonably good LoRA starting from video files with very little effort, it's almost all automated. I did one i2v LoRA, holy shit that's really suited for distributed compute, not just a single GPU.