>mfw Research news

06/11/2025

>CoQMoE: Co-Designed Quantization and Computation Orchestration for Mixture-of-Experts Vision Transformer on FPGA
https://arxiv.org/abs/2506.08496

>Re-Thinking the Automatic Evaluation of Image-Text Alignment in Text-to-Image Models
https://arxiv.org/abs/2506.08480

>Enhancing Motion Dynamics of Image-to-Video Models via Adaptive Low-Pass Guidance
http://choi403.github.io/ALG

>LiftVSR: Lifting Image Diffusion to Video Super-Resolution via Hybrid Temporal Modeling with Only 4$\times$RTX 4090s
https://kopperx.github.io/projects/liftvsr

>SakugaFlow: A Stagewise Illustration Framework Emulating the Human Drawing Process and Providing Interactive Tutoring for Novice Drawing Skills
https://arxiv.org/abs/2506.08443

>Diffuse and Disperse: Image Generation with Representation Regularization
https://arxiv.org/abs/2506.09027

>Do Concept Replacement Techniques Really Erase Unacceptable Concepts?
https://arxiv.org/abs/2506.08991

>Inherently Faithful Attention Maps for Vision Transformers
https://arxiv.org/abs/2506.08915

>Product of Experts for Visual Generation
https://product-of-experts.github.io

>CulturalFrames: Assessing Cultural Expectation Alignment in Text-to-Image Models and Evaluation Metrics
https://arxiv.org/abs/2506.08835

>HunyuanVideo-HOMA: Generic Human-Object Interaction in Multimodal Driven Human Animation
https://anonymous.4open.science/w/homa-page-0FBE

>A Comprehensive Study of Decoder-Only LLMs for Text-to-Image Generation
https://arxiv.org/abs/2506.08210

>SECOND: Mitigating Perceptual Hallucination in Vision-Language Models via Selective and Contrastive Decoding
https://arxiv.org/abs/2506.08391

>How Much To Guide: Revisiting Adaptive Guidance in Classifier-Free Guidance Text-to-Vision Diffusion Models
https://arxiv.org/abs/2506.08351

>CuRe: Cultural Gaps in the Long Tail of Text-to-Image Systems
https://aniketrege.github.io/cure