Models used in CHARM: Calibrating Reward Models With Chatbot Arena Scores.
shawnxzhu
shawnxzhu
AI & ML interests
None yet
Recent Activity
upvoted a paper 22 days ago
Reinforcement Learning Elicits Contextual Learning of Unseen Language Translation authored a paper about 1 month ago
EnvFactory: Scaling Tool-Use Agents via Executable Environments Synthesis and Robust RL