Mean Mode Screaming: Mean--Variance Split Residuals for 1000-Layer Diffusion Transformers Paper • 2605.06169 • Published 8 days ago • 182
A Foundation Model for Zero-Shot Logical Rule Induction Paper • 2605.04916 • Published 9 days ago • 4
Instruction-Guided Poetry Generation in Arabic and Its Dialects Paper • 2604.27766 • Published 15 days ago • 4
Heterogeneous Scientific Foundation Model Collaboration Paper • 2604.27351 • Published 15 days ago • 213
GameWorld: Towards Standardized and Verifiable Evaluation of Multimodal Game Agents Paper • 2604.07429 • Published Apr 8 • 121
Watch Before You Answer: Learning from Visually Grounded Post-Training Paper • 2604.05117 • Published Apr 6 • 35
CARLA-Air: Fly Drones Inside a CARLA World -- A Unified Infrastructure for Air-Ground Embodied Intelligence Paper • 2603.28032 • Published Mar 30 • 342
Generation Models Know Space: Unleashing Implicit 3D Priors for Scene Understanding Paper • 2603.19235 • Published Mar 19 • 95
From Blind Spots to Gains: Diagnostic-Driven Iterative Training for Large Multimodal Models Paper • 2602.22859 • Published Feb 26 • 151