ABC-Bench: Benchmarking Agentic Backend Coding in Real-World Development Paper • 2601.11077 • Published 7 days ago • 62
BAPO: Stabilizing Off-Policy Reinforcement Learning for LLMs via Balanced Policy Optimization with Adaptive Clipping Paper • 2510.18927 • Published Oct 21, 2025 • 84
Information Gain-based Policy Optimization: A Simple and Effective Approach for Multi-Turn LLM Agents Paper • 2510.14967 • Published Oct 16, 2025 • 34
TÜLU 3: Pushing Frontiers in Open Language Model Post-Training Paper • 2411.15124 • Published Nov 22, 2024 • 67