Representation Forcing for Bottleneck-Free Unified Multimodal Models Paper • 2605.31604 • Published 9 days ago • 57
Agent Explorative Policy Optimization for Multimodal Agentic Reasoning Paper • 2605.28774 • Published 11 days ago • 89
Self-Improving Language Models with Bidirectional Evolutionary Search Paper • 2605.28814 • Published 11 days ago • 59
LLaVA-OneVision-2: Towards Next-Generation Perceptual Intelligence Paper • 2605.25979 • Published 13 days ago • 27
Gemini Embedding 2: A Native Multimodal Embedding Model from Gemini Paper • 2605.27295 • Published 12 days ago • 23
LocateAnything: Fast and High-Quality Vision-Language Grounding with Parallel Box Decoding Paper • 2605.27365 • Published 12 days ago • 138
PiD: Fast and High-Resolution Latent Decoding with Pixel Diffusion Paper • 2605.23902 • Published 16 days ago • 45
Achieving Gold-Medal-Level Olympiad Reasoning via Simple and Unified Scaling Paper • 2605.13301 • Published 25 days ago • 159
Beyond the Last Layer: Multi-Layer Representation Fusion for Visual Tokenization Paper • 2605.10780 • Published 26 days ago • 33
SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-unify Architecture Paper • 2605.12500 • Published 26 days ago • 191
δ-mem: Efficient Online Memory for Large Language Models Paper • 2605.12357 • Published 26 days ago • 125
Prox-E: Fine-Grained 3D Shape Editing via Primitive-Based Abstractions Paper • 2604.23774 • Published Apr 29 • 17
MoCapAnything V2: End-to-End Motion Capture for Arbitrary Skeletons Paper • 2604.28130 • Published Apr 30 • 22
CoInteract: Physically-Consistent Human-Object Interaction Video Synthesis via Spatially-Structured Co-Generation Paper • 2604.19636 • Published Apr 21 • 87
OneVL: One-Step Latent Reasoning and Planning with Vision-Language Explanation Paper • 2604.18486 • Published Apr 20 • 95
The Past Is Not Past: Memory-Enhanced Dynamic Reward Shaping Paper • 2604.11297 • Published Apr 13 • 144