Anonymous
7/10/2025, 8:24:39 AM
No.105855697
>mfw Research news
07/10/2025
>PromptTea: Let Prompts Tell TeaCache the Optimal Threshold
https://arxiv.org/abs/2507.06739
>Enhancing Diffusion Model Stability for Image Restoration via Gradient Management
https://arxiv.org/abs/2507.06656
>MS-DPPs: Multi-Source Determinantal Point Processes for Contextual Diversity Refinement of Composite Attributes in Text to Image Retrieval
https://arxiv.org/abs/2507.06654
>Towards Multimodal Understanding via Stable Diffusion as a Task-Aware Feature Extractor
https://vatsalag99.github.io/mustafar
>Concept-TRAK: Understanding how diffusion models learn concepts through concept-level attribution
https://arxiv.org/abs/2507.06547
>Spatial-Temporal Graph Mamba for Music-Guided Dance Video Synthesis
https://arxiv.org/abs/2507.06689
>Vision-Language-Vision Auto-Encoder: Scalable Knowledge Distillation from Diffusion Models
https://arxiv.org/abs/2507.07104
>Democratizing High-Fidelity Co-Speech Gesture Video Generation
https://arxiv.org/abs/2507.06812
>Denoising Multi-Beta VAE: Representation Learning for Disentanglement and Generation
https://arxiv.org/abs/2507.06613
>Evaluating Attribute Confusion in Fashion Text-to-Image Generation
https://intelligolabs.github.io/L-VQAScore
>Physics-Grounded Motion Forecasting via Equation Discovery for Trajectory-Guided Image-to-Video Generation
https://arxiv.org/abs/2507.06830
>3D-Generalist: Self-Improving Vision-Language-Action Models for Crafting 3D Worlds
https://ai.stanford.edu/~sunfanyun/3d-generalist
>Concept Unlearning by Modeling Key Steps of Diffusion Process
https://arxiv.org/abs/2507.06526
>FIFA: Unified Faithfulness Evaluation Framework for Text-to-Video and Video-to-Text Generation
https://arxiv.org/abs/2507.06523
>DynamicID: Zero-Shot Multi-ID Image Personalization with Flexible Facial Editability
https://arxiv.org/abs/2503.06505
07/10/2025
>PromptTea: Let Prompts Tell TeaCache the Optimal Threshold
https://arxiv.org/abs/2507.06739
>Enhancing Diffusion Model Stability for Image Restoration via Gradient Management
https://arxiv.org/abs/2507.06656
>MS-DPPs: Multi-Source Determinantal Point Processes for Contextual Diversity Refinement of Composite Attributes in Text to Image Retrieval
https://arxiv.org/abs/2507.06654
>Towards Multimodal Understanding via Stable Diffusion as a Task-Aware Feature Extractor
https://vatsalag99.github.io/mustafar
>Concept-TRAK: Understanding how diffusion models learn concepts through concept-level attribution
https://arxiv.org/abs/2507.06547
>Spatial-Temporal Graph Mamba for Music-Guided Dance Video Synthesis
https://arxiv.org/abs/2507.06689
>Vision-Language-Vision Auto-Encoder: Scalable Knowledge Distillation from Diffusion Models
https://arxiv.org/abs/2507.07104
>Democratizing High-Fidelity Co-Speech Gesture Video Generation
https://arxiv.org/abs/2507.06812
>Denoising Multi-Beta VAE: Representation Learning for Disentanglement and Generation
https://arxiv.org/abs/2507.06613
>Evaluating Attribute Confusion in Fashion Text-to-Image Generation
https://intelligolabs.github.io/L-VQAScore
>Physics-Grounded Motion Forecasting via Equation Discovery for Trajectory-Guided Image-to-Video Generation
https://arxiv.org/abs/2507.06830
>3D-Generalist: Self-Improving Vision-Language-Action Models for Crafting 3D Worlds
https://ai.stanford.edu/~sunfanyun/3d-generalist
>Concept Unlearning by Modeling Key Steps of Diffusion Process
https://arxiv.org/abs/2507.06526
>FIFA: Unified Faithfulness Evaluation Framework for Text-to-Video and Video-to-Text Generation
https://arxiv.org/abs/2507.06523
>DynamicID: Zero-Shot Multi-ID Image Personalization with Flexible Facial Editability
https://arxiv.org/abs/2503.06505