Collections
Discover the best community collections!
Collections including paper arxiv:2504.20571
-
Reinforcement Learning for Reasoning in Large Language Models with One Training Example
Paper • 2504.20571 • Published • 98 -
One RL to See Them All: Visual Triple Unified Reinforcement Learning
Paper • 2505.18129 • Published • 62 -
Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't
Paper • 2503.16219 • Published • 52 -
Performance Trade-offs of Optimizing Small Language Models for E-Commerce
Paper • 2510.21970 • Published • 3
-
AdaptThink: Reasoning Models Can Learn When to Think
Paper • 2505.13417 • Published • 83 -
Reinforcement Learning for Reasoning in Large Language Models with One Training Example
Paper • 2504.20571 • Published • 98 -
GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization
Paper • 2601.05242 • Published • 222
-
Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning
Paper • 2508.08221 • Published • 50 -
Reinforcement Learning for Reasoning in Large Language Models with One Training Example
Paper • 2504.20571 • Published • 98 -
RLPR: Extrapolating RLVR to General Domains without Verifiers
Paper • 2506.18254 • Published • 32
-
Reinforcement Learning for Reasoning in Large Language Models with One Training Example
Paper • 2504.20571 • Published • 98 -
ypwang61/One-Shot-RLVR-Qwen2.5-Math-1.5B-pi1
Text Generation • 2B • Updated • 2 -
ypwang61/One-Shot-RLVR-Qwen2.5-Math-1.5B-pi13
Text Generation • 2B • Updated • 1 -
ypwang61/One-Shot-RLVR-Qwen2.5-Math-1.5B-pi1209
2B • Updated • 1
-
Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems
Paper • 2504.01990 • Published • 301 -
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models
Paper • 2504.10479 • Published • 306 -
What, How, Where, and How Well? A Survey on Test-Time Scaling in Large Language Models
Paper • 2503.24235 • Published • 54 -
Seedream 3.0 Technical Report
Paper • 2504.11346 • Published • 70
-
Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning
Paper • 2508.08221 • Published • 50 -
Reinforcement Learning for Reasoning in Large Language Models with One Training Example
Paper • 2504.20571 • Published • 98 -
RLPR: Extrapolating RLVR to General Domains without Verifiers
Paper • 2506.18254 • Published • 32
-
Reinforcement Learning for Reasoning in Large Language Models with One Training Example
Paper • 2504.20571 • Published • 98 -
One RL to See Them All: Visual Triple Unified Reinforcement Learning
Paper • 2505.18129 • Published • 62 -
Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't
Paper • 2503.16219 • Published • 52 -
Performance Trade-offs of Optimizing Small Language Models for E-Commerce
Paper • 2510.21970 • Published • 3
-
AdaptThink: Reasoning Models Can Learn When to Think
Paper • 2505.13417 • Published • 83 -
Reinforcement Learning for Reasoning in Large Language Models with One Training Example
Paper • 2504.20571 • Published • 98 -
GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization
Paper • 2601.05242 • Published • 222
-
Reinforcement Learning for Reasoning in Large Language Models with One Training Example
Paper • 2504.20571 • Published • 98 -
ypwang61/One-Shot-RLVR-Qwen2.5-Math-1.5B-pi1
Text Generation • 2B • Updated • 2 -
ypwang61/One-Shot-RLVR-Qwen2.5-Math-1.5B-pi13
Text Generation • 2B • Updated • 1 -
ypwang61/One-Shot-RLVR-Qwen2.5-Math-1.5B-pi1209
2B • Updated • 1
-
Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems
Paper • 2504.01990 • Published • 301 -
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models
Paper • 2504.10479 • Published • 306 -
What, How, Where, and How Well? A Survey on Test-Time Scaling in Large Language Models
Paper • 2503.24235 • Published • 54 -
Seedream 3.0 Technical Report
Paper • 2504.11346 • Published • 70