>>106977526
>Yes! This is our top priority on the TODO list. We are actively working on upgrading SVI to support Wan 2.2 and higher resolutions including 720p.

>PS: We’re collecting TODO suggestions! We want to better meet real user needs—if you have ideas, please share them in our Issues. Also, if you have great reference images and text prompts but lack the resources to run inference, leave them in an Issue and we’ll run the inference for you!

>Long-form generation: We tested 10-minute talking videos with absolutely no drifting issues

>Potential Issue 1: Slight color shift. This issue has two main causes:
>VAE encoding-decoding
rip

>Q4: Did you consider building upon the Self-Forcing series of works?
>Initially, we did want to build upon Self-Forcing, but several critical issues led us to abandon this approach:

>Different objectives: The Self-Forcing series is better suited for scenarios prioritizing real-time interaction (e.g., gaming), where visual quality does not need to reach cinematic standards. In contrast, our work focuses on story content creation, requiring higher standards for both content and visual quality.

>Causal/Bidirectional considerations: Self-Forcing achieves frame-by-frame causality, whereas SVI operates at a clip-by-clip or chunk-by-chunk level of causality, with bidirectional processing within each clip. Since SVI targets film and video production, our design mirrors a director's workflow: directors repeatedly review clips in both forward and reverse to ensure quality. SVI maintains bidirectionality within each clip to emulate this process. Directors then seamlessly connect different clips along the temporal axis with causality consistency, which aligns with SVI's chunk-by-chunk causality. Intuitively, SVI's paradigm has unique advantages in end-to-end high-quality video content creation.