Anonymous
7/12/2025, 9:13:51 PM
No.105884391
>mfw Research news
07/12/2025
>ADIEE: Automatic Dataset Creation and Scorer for Instruction-Guided Image Editing Evaluation
https://arxiv.org/abs/2507.07317
>TalkFashion: Intelligent Virtual Try-On Assistant Based on Multimodal Large Language Model
https://arxiv.org/abs/2507.05790
>Fair Domain Generalization: An Information-Theoretic View
https://arxiv.org/abs/2507.05823
>Rethinking Layered Graphic Design Generation with a Top-Down Approach
https://arxiv.org/abs/2507.05601
>ReLayout: Integrating Relation Reasoning for Content-aware Layout Generation with Multi-modal Large Language Models
https://arxiv.org/abs/2507.05568
>OpenWorldSAM: Extending SAM2 for Universal Image Segmentation with Language Prompts
https://arxiv.org/abs/2507.05427
>Kernel Density Steering: Inference-Time Scaling via Mode Seeking for Image Restoration
https://arxiv.org/abs/2507.05604
>Critiques of World Models
https://arxiv.org/abs/2507.05169
>MoDiT: Learning Highly Consistent 3D Motion Coefficients with Diffusion Transformer for Talking Head Generation
https://arxiv.org/abs/2507.05092
>EXPOTION: Facial Expression and Motion Control for Multimodal Music Generation
https://arxiv.org/abs/2507.04955
>ZERO: Multi-modal Prompt-based Visual Grounding
https://arxiv.org/abs/2507.04270
>SeqTex: Generate Mesh Textures in Video Sequence
https://arxiv.org/abs/2507.04285
>PromptSR: Cascade Prompting for Lightweight Image Super-Resolution
https://arxiv.org/abs/2507.04118
>Exploring Kolmogorov-Arnold Network Expansions in Vision Transformers for Mitigating Catastrophic Forgetting in Continual Learning
https://arxiv.org/abs/2507.04020
>EchoMimicV3: 1.3B Parameters are All You Need for Unified Multi-Modal and Multi-Task Human Animation
https://arxiv.org/abs/2507.03905
07/12/2025
>ADIEE: Automatic Dataset Creation and Scorer for Instruction-Guided Image Editing Evaluation
https://arxiv.org/abs/2507.07317
>TalkFashion: Intelligent Virtual Try-On Assistant Based on Multimodal Large Language Model
https://arxiv.org/abs/2507.05790
>Fair Domain Generalization: An Information-Theoretic View
https://arxiv.org/abs/2507.05823
>Rethinking Layered Graphic Design Generation with a Top-Down Approach
https://arxiv.org/abs/2507.05601
>ReLayout: Integrating Relation Reasoning for Content-aware Layout Generation with Multi-modal Large Language Models
https://arxiv.org/abs/2507.05568
>OpenWorldSAM: Extending SAM2 for Universal Image Segmentation with Language Prompts
https://arxiv.org/abs/2507.05427
>Kernel Density Steering: Inference-Time Scaling via Mode Seeking for Image Restoration
https://arxiv.org/abs/2507.05604
>Critiques of World Models
https://arxiv.org/abs/2507.05169
>MoDiT: Learning Highly Consistent 3D Motion Coefficients with Diffusion Transformer for Talking Head Generation
https://arxiv.org/abs/2507.05092
>EXPOTION: Facial Expression and Motion Control for Multimodal Music Generation
https://arxiv.org/abs/2507.04955
>ZERO: Multi-modal Prompt-based Visual Grounding
https://arxiv.org/abs/2507.04270
>SeqTex: Generate Mesh Textures in Video Sequence
https://arxiv.org/abs/2507.04285
>PromptSR: Cascade Prompting for Lightweight Image Super-Resolution
https://arxiv.org/abs/2507.04118
>Exploring Kolmogorov-Arnold Network Expansions in Vision Transformers for Mitigating Catastrophic Forgetting in Continual Learning
https://arxiv.org/abs/2507.04020
>EchoMimicV3: 1.3B Parameters are All You Need for Unified Multi-Modal and Multi-Task Human Animation
https://arxiv.org/abs/2507.03905