Mistral Medium 3.5 Collection Our first flaship models handling instruction-following, reasoning, and coding in a single set of opened-weights. • 2 items • Updated 3 days ago • 13
OctoThinker: Mid-training Incentivizes Reinforcement Learning Scaling Paper • 2506.20512 • Published Jun 25, 2025 • 48
GameFactory: Creating New Games with Generative Interactive Videos Paper • 2501.08325 • Published Jan 14, 2025 • 68
OlympiadBench: A Challenging Benchmark for Promoting AGI with Olympiad-Level Bilingual Multimodal Scientific Problems Paper • 2402.14008 • Published Feb 21, 2024 • 1
HellaSwag: Can a Machine Really Finish Your Sentence? Paper • 1905.07830 • Published May 19, 2019 • 8
Institutional Books 1.0: A 242B token dataset from Harvard Library's collections, refined for accuracy and usability Paper • 2506.08300 • Published Jun 10, 2025 • 10
AppTek Call-Center Dialogues: A Multi-Accent Long-Form Benchmark for English ASR Paper • 2604.27543 • Published 3 days ago • 2
Step-level Optimization for Efficient Computer-use Agents Paper • 2604.27151 • Published 4 days ago • 9
Synthetic Computers at Scale for Long-Horizon Productivity Simulation Paper • 2604.28181 • Published 3 days ago • 14
The Last Human-Written Paper: Agent-Native Research Artifacts Paper • 2604.24658 • Published 4 days ago • 10
Claw-Eval-Live: A Live Agent Benchmark for Evolving Real-World Workflows Paper • 2604.28139 • Published 3 days ago • 28
ExoActor: Exocentric Video Generation as Generalizable Interactive Humanoid Control Paper • 2604.27711 • Published 3 days ago • 35
Efficient Training on Multiple Consumer GPUs with RoundPipe Paper • 2604.27085 • Published 4 days ago • 30
Heterogeneous Scientific Foundation Model Collaboration Paper • 2604.27351 • Published 3 days ago • 187
Programming with Data: Test-Driven Data Engineering for Self-Improving LLMs from Raw Corpora Paper • 2604.24819 • Published 6 days ago • 82
RADIO-ViPE: Online Tightly Coupled Multi-Modal Fusion for Open-Vocabulary Semantic SLAM in Dynamic Environments Paper • 2604.26067 • Published 5 days ago • 64
Video Analysis and Generation via a Semantic Progress Function Paper • 2604.22554 • Published 9 days ago • 63
ClawGym: A Scalable Framework for Building Effective Claw Agents Paper • 2604.26904 • Published 4 days ago • 46
Turning the TIDE: Cross-Architecture Distillation for Diffusion Large Language Models Paper • 2604.26951 • Published 4 days ago • 42