-
Language Models are Few-Shot Learners
Paper • 2005.14165 • Published • 19 -
Evaluating Large Language Models Trained on Code
Paper • 2107.03374 • Published • 8 -
Training language models to follow instructions with human feedback
Paper • 2203.02155 • Published • 24 -
GPT-4 Technical Report
Paper • 2303.08774 • Published • 7
Collections
Discover the best community collections!
Collections including paper arxiv:2508.06471
-
Two Minds Better Than One: Collaborative Reward Modeling for LLM Alignment
Paper • 2505.10597 • Published -
COIG-P: A High-Quality and Large-Scale Chinese Preference Dataset for Alignment with Human Values
Paper • 2504.05535 • Published • 44 -
nvidia/HelpSteer3
Viewer • Updated • 133k • 2.48k • 95 -
nvidia/Nemotron-RL-instruction_following
Preview • Updated • 131 • 9
-
GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models
Paper • 2508.06471 • Published • 206 -
Agent-R: Training Language Model Agents to Reflect via Iterative Self-Training
Paper • 2501.11425 • Published • 109 -
Agent Laboratory: Using LLM Agents as Research Assistants
Paper • 2501.04227 • Published • 95 -
Agent KB: Leveraging Cross-Domain Experience for Agentic Problem Solving
Paper • 2507.06229 • Published • 76
-
Less is More: Recursive Reasoning with Tiny Networks
Paper • 2510.04871 • Published • 506 -
zai-org/GLM-4.6
Text Generation • 357B • Updated • 40.4k • • 1.21k -
deepseek-ai/DeepSeek-R1
Text Generation • 685B • Updated • 455k • • 13k -
deepseek-ai/DeepSeek-V3.2-Exp
Text Generation • 685B • Updated • 41.4k • • 949
-
Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning
Paper • 2505.24726 • Published • 277 -
Reinforcement Pre-Training
Paper • 2506.08007 • Published • 263 -
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning
Paper • 2507.01006 • Published • 251 -
A Survey of Context Engineering for Large Language Models
Paper • 2507.13334 • Published • 261
-
GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models
Paper • 2508.06471 • Published • 206 -
NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model
Paper • 2508.14444 • Published • 42 -
Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities
Paper • 2507.06261 • Published • 67 -
MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention
Paper • 2506.13585 • Published • 273
-
Language Models are Few-Shot Learners
Paper • 2005.14165 • Published • 19 -
Evaluating Large Language Models Trained on Code
Paper • 2107.03374 • Published • 8 -
Training language models to follow instructions with human feedback
Paper • 2203.02155 • Published • 24 -
GPT-4 Technical Report
Paper • 2303.08774 • Published • 7
-
Two Minds Better Than One: Collaborative Reward Modeling for LLM Alignment
Paper • 2505.10597 • Published -
COIG-P: A High-Quality and Large-Scale Chinese Preference Dataset for Alignment with Human Values
Paper • 2504.05535 • Published • 44 -
nvidia/HelpSteer3
Viewer • Updated • 133k • 2.48k • 95 -
nvidia/Nemotron-RL-instruction_following
Preview • Updated • 131 • 9
-
Less is More: Recursive Reasoning with Tiny Networks
Paper • 2510.04871 • Published • 506 -
zai-org/GLM-4.6
Text Generation • 357B • Updated • 40.4k • • 1.21k -
deepseek-ai/DeepSeek-R1
Text Generation • 685B • Updated • 455k • • 13k -
deepseek-ai/DeepSeek-V3.2-Exp
Text Generation • 685B • Updated • 41.4k • • 949
-
Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning
Paper • 2505.24726 • Published • 277 -
Reinforcement Pre-Training
Paper • 2506.08007 • Published • 263 -
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning
Paper • 2507.01006 • Published • 251 -
A Survey of Context Engineering for Large Language Models
Paper • 2507.13334 • Published • 261
-
GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models
Paper • 2508.06471 • Published • 206 -
Agent-R: Training Language Model Agents to Reflect via Iterative Self-Training
Paper • 2501.11425 • Published • 109 -
Agent Laboratory: Using LLM Agents as Research Assistants
Paper • 2501.04227 • Published • 95 -
Agent KB: Leveraging Cross-Domain Experience for Agentic Problem Solving
Paper • 2507.06229 • Published • 76
-
GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models
Paper • 2508.06471 • Published • 206 -
NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model
Paper • 2508.14444 • Published • 42 -
Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities
Paper • 2507.06261 • Published • 67 -
MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention
Paper • 2506.13585 • Published • 273