Rediscovering Entropy Regularization: Adaptive Coefficient Unlocks Its Potential for LLM Reinforcement Learning Paper • 2510.10959 • Published Oct 13, 2025 • 2
Step 3.5 Flash: Open Frontier-Level Intelligence with 11B Active Parameters Paper • 2602.10604 • Published Feb 11 • 193
Reasoner for Real-World Event Detection: Scaling Reinforcement Learning via Adaptive Perplexity-Aware Sampling Strategy Paper • 2507.01327 • Published Jul 2, 2025 • 1
TRUST-SQL: Tool-Integrated Multi-Turn Reinforcement Learning for Text-to-SQL over Unknown Schemas Paper • 2603.16448 • Published 14 days ago • 58
TRUST-SQL: Tool-Integrated Multi-Turn Reinforcement Learning for Text-to-SQL over Unknown Schemas Paper • 2603.16448 • Published 14 days ago • 58
WebGuard: Building a Generalizable Guardrail for Web Agents Paper • 2507.14293 • Published Jul 18, 2025 • 1
Exploratory Memory-Augmented LLM Agent via Hybrid On- and Off-Policy Optimization Paper • 2602.23008 • Published Feb 26 • 36
World Models with Hints of Large Language Models for Goal Achieving Paper • 2406.07381 • Published Jun 11, 2024 • 1
ADG: Ambient Diffusion-Guided Dataset Recovery for Corruption-Robust Offline Reinforcement Learning Paper • 2505.23871 • Published May 29, 2025 • 1
Multi-Agent Coordination via Multi-Level Communication Paper • 2209.12713 • Published Sep 26, 2022 • 2
When to Memorize and When to Stop: Gated Recurrent Memory for Long-Context Reasoning Paper • 2602.10560 • Published Feb 11 • 30
ScaleEnv: Scaling Environment Synthesis from Scratch for Generalist Interactive Tool-Use Agent Training Paper • 2602.06820 • Published Feb 6 • 13