QEDBENCH: Quantifying the Alignment Gap in Automated Evaluation of University-Level Mathematical Proofs Paper • 2602.20629 • Published Feb 24 • 4
ParEVO: Synthesizing Code for Irregular Data: High-Performance Parallelism through Agentic Evolution Paper • 2603.02510 • Published Mar 3 • 3