SpecEyes: Accelerating Agentic Multimodal LLMs via Speculative Perception and Planning Paper • 2603.23483 • Published 4 days ago • 57
QuantVLA: Scale-Calibrated Post-Training Quantization for Vision-Language-Action Models Paper • 2602.20309 • Published Feb 23 • 16
DSDR: Dual-Scale Diversity Regularization for Exploration in LLM Reasoning Paper • 2602.19895 • Published Feb 23 • 13
HERMES: KV Cache as Hierarchical Memory for Efficient Streaming Video Understanding Paper • 2601.14724 • Published Jan 21 • 75
MMDeepResearch-Bench: A Benchmark for Multimodal Deep Research Agents Paper • 2601.12346 • Published Jan 18 • 50
Training Long-Context, Multi-Turn Software Engineering Agents with Reinforcement Learning Paper • 2508.03501 • Published Aug 5, 2025 • 59
CMPhysBench: A Benchmark for Evaluating Large Language Models in Condensed Matter Physics Paper • 2508.18124 • Published Aug 25, 2025 • 49
Electrocardiogram Instruction Tuning for Report Generation Paper • 2403.04945 • Published Mar 7, 2024 • 2
SVD-LLM: Truncation-aware Singular Value Decomposition for Large Language Model Compression Paper • 2403.07378 • Published Mar 12, 2024 • 4
MemAgent: Reshaping Long-Context LLM with Multi-Conv RL-based Memory Agent Paper • 2507.02259 • Published Jul 3, 2025 • 5
Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens Paper • 2506.17218 • Published Jun 20, 2025 • 29
PAG: Multi-Turn Reinforced LLM Self-Correction with Policy as Generative Verifier Paper • 2506.10406 • Published Jun 12, 2025 • 2
DyCodeEval Collection DyCodeEval (ICML 2025) enables dynamic benchmarking for code LLMs. This collection features dynamic HumanEval and MBPP sets generated with Claude 3.5. • 3 items • Updated Jun 27, 2025 • 4
Interleaved Reasoning for Large Language Models via Reinforcement Learning Paper • 2505.19640 • Published May 26, 2025 • 15