Transition Matching Distillation for Fast Video Generation Paper • 2601.09881 • Published 5 days ago • 30
Think-Then-Generate: Reasoning-Aware Text-to-Image Diffusion with LLM Encoders Paper • 2601.10332 • Published 5 days ago • 26
view changelog Changelog Team & Enterprise Articles Now Featured on the Hugging Face Blog Dec 8, 2025 • 90
UM-Text: A Unified Multimodal Model for Image Understanding Paper • 2601.08321 • Published 7 days ago • 7
Efficient Camera-Controlled Video Generation of Static Scenes via Sparse Diffusion and 3D Rendering Paper • 2601.09697 • Published 6 days ago • 7
view article Article How We Built a Semantic Highlight Model To Save Token Cost for RAG 5 days ago • 48
Aligning Text, Code, and Vision: A Multi-Objective Reinforcement Learning Framework for Text-to-Visualization Paper • 2601.04582 • Published 12 days ago • 9
User-Oriented Multi-Turn Dialogue Generation with Tool Use at scale Paper • 2601.08225 • Published 7 days ago • 48
SnapGen++: Unleashing Diffusion Transformers for Efficient High-Fidelity Image Generation on Edge Devices Paper • 2601.08303 • Published 7 days ago • 15
MHLA: Restoring Expressivity of Linear Attention via Token-Level Multi-Head Paper • 2601.07832 • Published 8 days ago • 46
Can We Predict Before Executing Machine Learning Agents? Paper • 2601.05930 • Published 11 days ago • 25
Qwen3-VL-Embedding and Qwen3-VL-Reranker: A Unified Framework for State-of-the-Art Multimodal Retrieval and Ranking Paper • 2601.04720 • Published 12 days ago • 45
MMFormalizer: Multimodal Autoformalization in the Wild Paper • 2601.03017 • Published 14 days ago • 102