Music Flamingo: Scaling Music Understanding in Audio Language Models Paper • 2511.10289 • Published Nov 13, 2025 • 14
Entropy-Adaptive Fine-Tuning: Resolving Confident Conflicts to Mitigate Forgetting Paper • 2601.02151 • Published 18 days ago • 100
CosyEdit: Unlocking End-to-End Speech Editing Capability from Zero-Shot Text-to-Speech Models Paper • 2601.05329 • Published 14 days ago • 1
Avatar Forcing: Real-Time Interactive Head Avatar Generation for Natural Conversation Paper • 2601.00664 • Published 21 days ago • 54
Knot Forcing: Taming Autoregressive Video Diffusion Models for Real-time Infinite Interactive Portrait Animation Paper • 2512.21734 • Published 28 days ago • 5
LiveTalk: Real-Time Multimodal Interactive Video Diffusion via Improved On-Policy Distillation Paper • 2512.23576 • Published 24 days ago • 65
Both Semantics and Reconstruction Matter: Making Representation Encoders Ready for Text-to-Image Generation and Editing Paper • 2512.17909 • Published Dec 19, 2025 • 37
Scaling Behavior of Discrete Diffusion Language Models Paper • 2512.10858 • Published Dec 11, 2025 • 8
Ditto: Motion-Space Diffusion for Controllable Realtime Talking Head Synthesis Paper • 2411.19509 • Published Nov 29, 2024 • 3
Semantics Lead the Way: Harmonizing Semantic and Texture Modeling with Asynchronous Latent Diffusion Paper • 2512.04926 • Published Dec 4, 2025 • 42
TwinFlow: Realizing One-step Generation on Large Models with Self-adversarial Flows Paper • 2512.05150 • Published Dec 3, 2025 • 75
Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length Paper • 2512.04677 • Published Dec 4, 2025 • 169
view article Article SARLO-80: Worldwide Slant SAR Language Optic Dataset at 80 cm Resolution Dec 1, 2025 • 4
Evaluating In Silico Creativity: An Expert Review of AI Chess Compositions Paper • 2510.23772 • Published Oct 27, 2025 • 2