DeepResearchEval: An Automated Framework for Deep Research Task Construction and Agentic Evaluation Paper • 2601.09688 • Published 10 days ago • 123
GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization Paper • 2601.05242 • Published 16 days ago • 204
reinforce-flow/Reinforce-Ada-Est-1-p-Qwen2.5-Math-1.5B-500 Text Generation • 2B • Updated Nov 25, 2025 • 3
reinforce-flow/Reinforce-Ada-Est-1-p-Qwen2.5-Math-1.5B-500 Text Generation • 2B • Updated Nov 25, 2025 • 3
reinforce-flow/Reinforce-Ada-Est-1-p-Qwen2.5-Math-1.5B-450 Text Generation • 2B • Updated Nov 25, 2025
reinforce-flow/Reinforce-Ada-Est-1-p-Qwen2.5-Math-1.5B-450 Text Generation • 2B • Updated Nov 25, 2025
reinforce-flow/Reinforce-Ada-Est-1-p-Qwen2.5-Math-1.5B-400 Text Generation • 2B • Updated Nov 25, 2025
reinforce-flow/Reinforce-Ada-Est-1-p-Qwen2.5-Math-1.5B-400 Text Generation • 2B • Updated Nov 25, 2025
reinforce-flow/Reinforce-Ada-Est-1-p-Qwen2.5-Math-1.5B-350 Text Generation • 2B • Updated Nov 25, 2025
reinforce-flow/Reinforce-Ada-Est-1-p-Qwen2.5-Math-1.5B-350 Text Generation • 2B • Updated Nov 25, 2025
reinforce-flow/Reinforce-Ada-Est-1-p-Qwen2.5-Math-1.5B-300 Text Generation • 2B • Updated Nov 25, 2025
reinforce-flow/Reinforce-Ada-Est-1-p-Qwen2.5-Math-1.5B-300 Text Generation • 2B • Updated Nov 25, 2025
reinforce-flow/Reinforce-Ada-Est-1-p-Qwen2.5-Math-1.5B-250 Text Generation • 2B • Updated Nov 25, 2025
reinforce-flow/Reinforce-Ada-Est-1-p-Qwen2.5-Math-1.5B-250 Text Generation • 2B • Updated Nov 25, 2025
reinforce-flow/Reinforce-Ada-Est-1-p-Qwen2.5-Math-1.5B-200 Text Generation • 2B • Updated Nov 25, 2025
reinforce-flow/Reinforce-Ada-Est-1-p-Qwen2.5-Math-1.5B-200 Text Generation • 2B • Updated Nov 25, 2025
reinforce-flow/Reinforce-Ada-Est-1-p-Qwen2.5-Math-1.5B-150 Text Generation • 2B • Updated Nov 25, 2025
reinforce-flow/Reinforce-Ada-Est-1-p-Qwen2.5-Math-1.5B-150 Text Generation • 2B • Updated Nov 25, 2025
reinforce-flow/Reinforce-Ada-Est-1-p-Qwen2.5-Math-1.5B-100 Text Generation • 2B • Updated Nov 25, 2025
reinforce-flow/Reinforce-Ada-Est-1-p-Qwen2.5-Math-1.5B-100 Text Generation • 2B • Updated Nov 25, 2025