Anonymous
10/11/2025, 2:43:35 AM
No.106852280
>mfw Research news
10/10/2025
>VideoVerse: How Far is Your T2V Generator from a World Model?
https://arxiv.org/abs/2510.08398
>MultiCOIN: Multi-Modal COntrollable Video INbetweening
https://multicoinx.github.io/multicoin
>Kontinuous Kontext: Continuous Strength Control for Instruction-based Image Editing
https://snap-research.github.io/kontinuouskontext
>FlexTraj: I2V Generation with Flexible Point Trajectory Control
https://bestzzhang.github.io/FlexTraj
>InstructX: Towards Unified Visual Editing with MLLM Guidance
https://arxiv.org/abs/2510.08485
>Reinforcing Diffusion Models by Direct Group Preference Optimization
https://arxiv.org/abs/2510.08425
>UniVideo: Unified Understanding, Generation, and Editing for Videos
https://congwei1230.github.io/UniVideo
>LinVideo: Post-Training Framework towards O(n) Attention in Efficient Video Generation
https://arxiv.org/abs/2510.08318
>InstructUDrag: Joint Text Instructions and Object Dragging for Interactive Image Editing
https://arxiv.org/abs/2510.08181
>Beyond Textual CoT: Interleaved Text-Image Chains with Deep Confidence Reasoning for Image Editing
https://arxiv.org/abs/2510.08157
>Improving Temporal Understanding Logic Consistency in VLMs via Attention Enhancement
https://arxiv.org/abs/2510.08138
>TTOM: Test-Time Optimization and Memorization for Compositional VidGen
https://ttom-t2v.github.io
>The Rise of the Knowledge Sculptor: New Archetype for Knowledge Work in the Age of GenAI
https://arxiv.org/abs/2510.07829
>Controllable Video Synthesis via Variational Inference
https://video-synthesis-variational.github.io
>DynamicEval: Rethinking Evaluation for Dynamic T2V Synthesis
https://nithincbabu7.github.io/DynamicEval
>LLaVA-OneVision-1.5: Fully Open Framework for Democratized Multimodal Training
https://arxiv.org/abs/2509.23661
>HIVTP: Training-Free Method to Improve VLMs Efficiency via Hierarchical Visual Token Pruning Using Middle-Layer-Based Importance Score
https://arxiv.org/abs/2509.23663
10/10/2025
>VideoVerse: How Far is Your T2V Generator from a World Model?
https://arxiv.org/abs/2510.08398
>MultiCOIN: Multi-Modal COntrollable Video INbetweening
https://multicoinx.github.io/multicoin
>Kontinuous Kontext: Continuous Strength Control for Instruction-based Image Editing
https://snap-research.github.io/kontinuouskontext
>FlexTraj: I2V Generation with Flexible Point Trajectory Control
https://bestzzhang.github.io/FlexTraj
>InstructX: Towards Unified Visual Editing with MLLM Guidance
https://arxiv.org/abs/2510.08485
>Reinforcing Diffusion Models by Direct Group Preference Optimization
https://arxiv.org/abs/2510.08425
>UniVideo: Unified Understanding, Generation, and Editing for Videos
https://congwei1230.github.io/UniVideo
>LinVideo: Post-Training Framework towards O(n) Attention in Efficient Video Generation
https://arxiv.org/abs/2510.08318
>InstructUDrag: Joint Text Instructions and Object Dragging for Interactive Image Editing
https://arxiv.org/abs/2510.08181
>Beyond Textual CoT: Interleaved Text-Image Chains with Deep Confidence Reasoning for Image Editing
https://arxiv.org/abs/2510.08157
>Improving Temporal Understanding Logic Consistency in VLMs via Attention Enhancement
https://arxiv.org/abs/2510.08138
>TTOM: Test-Time Optimization and Memorization for Compositional VidGen
https://ttom-t2v.github.io
>The Rise of the Knowledge Sculptor: New Archetype for Knowledge Work in the Age of GenAI
https://arxiv.org/abs/2510.07829
>Controllable Video Synthesis via Variational Inference
https://video-synthesis-variational.github.io
>DynamicEval: Rethinking Evaluation for Dynamic T2V Synthesis
https://nithincbabu7.github.io/DynamicEval
>LLaVA-OneVision-1.5: Fully Open Framework for Democratized Multimodal Training
https://arxiv.org/abs/2509.23661
>HIVTP: Training-Free Method to Improve VLMs Efficiency via Hierarchical Visual Token Pruning Using Middle-Layer-Based Importance Score
https://arxiv.org/abs/2509.23663