SlimQwen: Exploring the Pruning and Distillation in Large MoE Model Pre-training Paper • 2605.08738 • Published May 9 • 13
Strategic Navigation or Stochastic Search? How Agents and Humans Reason Over Document Collections Paper • 2603.12180 • Published Mar 12 • 65
Memex(RL): Scaling Long-Horizon LLM Agents via Indexed Experience Memory Paper • 2603.04257 • Published Mar 4 • 19
Memex(RL): Scaling Long-Horizon LLM Agents via Indexed Experience Memory Paper • 2603.04257 • Published Mar 4 • 19
The Diffusion Duality, Chapter II: $Ψ$-Samplers and Efficient Curriculum Paper • 2602.21185 • Published Feb 24 • 4
Gecko: An Efficient Neural Architecture Inherently Processing Sequences with Arbitrary Lengths Paper • 2601.06463 • Published Jan 10 • 2
LYNX: Learning Dynamic Exits for Confidence-Controlled Reasoning Paper • 2512.05325 • Published Dec 5, 2025 • 5
Efficient Long-context Language Model Training by Core Attention Disaggregation Paper • 2510.18121 • Published Oct 20, 2025 • 124
Jina Embeddings: A Novel Set of High-Performance Sentence Embedding Models Paper • 2307.11224 • Published Jul 20, 2023 • 7
Boomerang Distillation Enables Zero-Shot Model Size Interpolation Paper • 2510.05064 • Published Oct 6, 2025 • 1