>>107110302
i tried 2.1 vae but got an error: Given groups=1, weight of size [16, 16, 1, 1, 1], expected input[1, 48, 31, 22, 40] to have 16 channels, but got 48 channels instead

>>107110317
yep doesn't work with 2.1

here is the video example, its a photo of a girl with prompt "girl dancing" i am super new at this so maybe i am screwing up something obvious?
my gpu is rtx3060 with 8GB vram so i don't think i can run any bigger parameter models, or is there some better model i can use? i need mainly image to video to make clips around 5 to 10 seconds