Anonymous
6/11/2025, 11:20:47 PM
No.105564873
>mfw Research news
06/11/2025
>CoQMoE: Co-Designed Quantization and Computation Orchestration for Mixture-of-Experts Vision Transformer on FPGA
https://arxiv.org/abs/2506.08496
>Re-Thinking the Automatic Evaluation of Image-Text Alignment in Text-to-Image Models
https://arxiv.org/abs/2506.08480
>Enhancing Motion Dynamics of Image-to-Video Models via Adaptive Low-Pass Guidance
http://choi403.github.io/ALG
>LiftVSR: Lifting Image Diffusion to Video Super-Resolution via Hybrid Temporal Modeling with Only 4$\times$RTX 4090s
https://kopperx.github.io/projects/liftvsr
>SakugaFlow: A Stagewise Illustration Framework Emulating the Human Drawing Process and Providing Interactive Tutoring for Novice Drawing Skills
https://arxiv.org/abs/2506.08443
>Diffuse and Disperse: Image Generation with Representation Regularization
https://arxiv.org/abs/2506.09027
>Do Concept Replacement Techniques Really Erase Unacceptable Concepts?
https://arxiv.org/abs/2506.08991
>Inherently Faithful Attention Maps for Vision Transformers
https://arxiv.org/abs/2506.08915
>Product of Experts for Visual Generation
https://product-of-experts.github.io
>CulturalFrames: Assessing Cultural Expectation Alignment in Text-to-Image Models and Evaluation Metrics
https://arxiv.org/abs/2506.08835
>HunyuanVideo-HOMA: Generic Human-Object Interaction in Multimodal Driven Human Animation
https://anonymous.4open.science/w/homa-page-0FBE
>A Comprehensive Study of Decoder-Only LLMs for Text-to-Image Generation
https://arxiv.org/abs/2506.08210
>SECOND: Mitigating Perceptual Hallucination in Vision-Language Models via Selective and Contrastive Decoding
https://arxiv.org/abs/2506.08391
>How Much To Guide: Revisiting Adaptive Guidance in Classifier-Free Guidance Text-to-Vision Diffusion Models
https://arxiv.org/abs/2506.08351
>CuRe: Cultural Gaps in the Long Tail of Text-to-Image Systems
https://aniketrege.github.io/cure
06/11/2025
>CoQMoE: Co-Designed Quantization and Computation Orchestration for Mixture-of-Experts Vision Transformer on FPGA
https://arxiv.org/abs/2506.08496
>Re-Thinking the Automatic Evaluation of Image-Text Alignment in Text-to-Image Models
https://arxiv.org/abs/2506.08480
>Enhancing Motion Dynamics of Image-to-Video Models via Adaptive Low-Pass Guidance
http://choi403.github.io/ALG
>LiftVSR: Lifting Image Diffusion to Video Super-Resolution via Hybrid Temporal Modeling with Only 4$\times$RTX 4090s
https://kopperx.github.io/projects/liftvsr
>SakugaFlow: A Stagewise Illustration Framework Emulating the Human Drawing Process and Providing Interactive Tutoring for Novice Drawing Skills
https://arxiv.org/abs/2506.08443
>Diffuse and Disperse: Image Generation with Representation Regularization
https://arxiv.org/abs/2506.09027
>Do Concept Replacement Techniques Really Erase Unacceptable Concepts?
https://arxiv.org/abs/2506.08991
>Inherently Faithful Attention Maps for Vision Transformers
https://arxiv.org/abs/2506.08915
>Product of Experts for Visual Generation
https://product-of-experts.github.io
>CulturalFrames: Assessing Cultural Expectation Alignment in Text-to-Image Models and Evaluation Metrics
https://arxiv.org/abs/2506.08835
>HunyuanVideo-HOMA: Generic Human-Object Interaction in Multimodal Driven Human Animation
https://anonymous.4open.science/w/homa-page-0FBE
>A Comprehensive Study of Decoder-Only LLMs for Text-to-Image Generation
https://arxiv.org/abs/2506.08210
>SECOND: Mitigating Perceptual Hallucination in Vision-Language Models via Selective and Contrastive Decoding
https://arxiv.org/abs/2506.08391
>How Much To Guide: Revisiting Adaptive Guidance in Classifier-Free Guidance Text-to-Vision Diffusion Models
https://arxiv.org/abs/2506.08351
>CuRe: Cultural Gaps in the Long Tail of Text-to-Image Systems
https://aniketrege.github.io/cure