DualPath: Breaking the Storage Bandwidth Bottleneck in Agentic LLM Inference Paper • 2602.21548 • Published Feb 25 • 53
TAPS: Task Aware Proposal Distributions for Speculative Sampling Paper • 2603.27027 • Published Mar 27 • 144
Draft Less, Retrieve More: Hybrid Tree Construction for Speculative Decoding Paper • 2605.20104 • Published 29 days ago • 7
Domino: Decoupling Causal Modeling from Autoregressive Drafting in Speculative Decoding Paper • 2605.29707 • Published 20 days ago • 145
Skip a Layer or Loop It? Learning Program-of-Layers in LLMs Paper • 2606.06574 • Published 13 days ago • 18