MaxProof: Scaling Mathematical Proof with Generative-Verifier RL and Population-Level Test-Time Scaling Paper • 2606.13473 • Published 8 days ago • 89
ComBench: A Benchmark for Rigorous Proof Reasoning and Constructive Realization in Olympiad-Level Combinatorics Paper • 2606.10479 • Published 10 days ago • 19
SubtleMemory: A Benchmark for Fine-Grained Relational Memory Discrimination in Long-Horizon AI Agents Paper • 2606.05761 • Published 15 days ago • 19
SceneCode: Executable World Programs for Editable Indoor Scenes with Articulated Objects Paper • 2605.19587 • Published May 19 • 10
Draft-OPD: On-Policy Distillation for Speculative Draft Models Paper • 2605.29343 • Published 22 days ago • 35
Gated DeltaNet-2: Decoupling Erase and Write in Linear Attention Paper • 2605.22791 • Published 29 days ago • 32
Full Attention Strikes Back: Transferring Full Attention into Sparse within Hundred Training Steps Paper • 2605.16928 • Published May 16 • 96