Beyond Stochastic Exploration: What Makes Training Data Valuable for Agentic Search Paper • 2604.08124 • Published 4 days ago • 4
VCRL: Variance-based Curriculum Reinforcement Learning for Large Language Models Paper • 2509.19803 • Published Sep 24, 2025 • 122
PVPO: Pre-Estimated Value-Based Policy Optimization for Agentic Reasoning Paper • 2508.21104 • Published Aug 28, 2025 • 37