Search Results
8/11/2025, 9:44:52 PM
>>106227130
Florence-2 because it's fast, for videos you just run it on the frames.
The shot categorizer thing is a Florence-2 finetune, it works very well. The released dataset is actually the results from the model, the model was trained on the real dataset which is from Shotdeck and wasn't released for copyright reasons, but it's accurate enough that you can't tell the difference, iirc it got like 90%+ accuracy which is better than the ShotVL bullshit Qwen finetune for literally exactly the same kind of caption results because they used the same dataset
Shotdeck is owned by some Hollywood guy so the backwards provenance where we said the model was trained on the released dataset was just in case, I actually scraped it when I was at Stability as well when I was working with Joe Penna, Joe knows the guy who owns Shotdeck so he was cool with it though. I know from that time that Midjourney trained on Shotdeck, the owner said he tried some captions and got exact shots back
Florence-2 because it's fast, for videos you just run it on the frames.
The shot categorizer thing is a Florence-2 finetune, it works very well. The released dataset is actually the results from the model, the model was trained on the real dataset which is from Shotdeck and wasn't released for copyright reasons, but it's accurate enough that you can't tell the difference, iirc it got like 90%+ accuracy which is better than the ShotVL bullshit Qwen finetune for literally exactly the same kind of caption results because they used the same dataset
Shotdeck is owned by some Hollywood guy so the backwards provenance where we said the model was trained on the released dataset was just in case, I actually scraped it when I was at Stability as well when I was working with Joe Penna, Joe knows the guy who owns Shotdeck so he was cool with it though. I know from that time that Midjourney trained on Shotdeck, the owner said he tried some captions and got exact shots back
Page 1