From Activation to Causality: Discovery of Causal Visual Representations in the Human Brain Paper • 2605.23895 • Published 22 days ago • 52
World Models Meet Language Models: On the Complementarity of Concrete and Abstract Reasoning Paper • 2606.03603 • Published 10 days ago • 29
K-BrowseComp: A Web Browsing Agent Benchmark Grounded in Korean Contexts Paper • 2606.02404 • Published 12 days ago • 56
OpenWebRL: Demystifying Online Multi-turn Reinforcement Learning for Visual Web Agents Paper • 2606.02031 • Published 12 days ago • 20
Where to Look: Can Foundation Models Reach a Target Viewpoint Through Active Exploration? Paper • 2606.01247 • Published 13 days ago • 30
Joint Agent Memory and Exploration Learning via Novelty Signals Paper • 2606.01528 • Published 12 days ago • 15
SkillAdaptor: Self-Adapting Skills for LLM Agents from Trajectories Paper • 2606.01311 • Published 13 days ago • 36
Seeing Isn't Knowing: Do VLMs Know When Not to Answer Spatial Questions (and Why)? Paper • 2605.30557 • Published 16 days ago • 12
WorldMemArena: Evaluating Multimodal Agent Memory Through Action-World Interaction Paper • 2605.29341 • Published 16 days ago • 18
Skill0.5: Joint Skill Internalization and Utilization for Out-of-Distribution Generalization in Agentic Reinforcement Learning Paper • 2605.28424 • Published 17 days ago • 32
Why Far Looks Up: Probing Spatial Representation in Vision-Language Models Paper • 2605.30161 • Published 16 days ago • 60
ExoActor: Exocentric Video Generation as Generalizable Interactive Humanoid Control Paper • 2604.27711 • Published Apr 30 • 41 • 5
ExoActor: Exocentric Video Generation as Generalizable Interactive Humanoid Control Paper • 2604.27711 • Published Apr 30 • 41
Map2World: Segment Map Conditioned Text to 3D World Generation Paper • 2605.00781 • Published May 1 • 25
RADIO-ViPE: Online Tightly Coupled Multi-Modal Fusion for Open-Vocabulary Semantic SLAM in Dynamic Environments Paper • 2604.26067 • Published Apr 28 • 74