CLIMB: CLustering-based Iterative Data Mixture Bootstrapping for Language Model Pre-training Paper • 2504.13161 • Published Apr 17, 2025 • 97
DocLLM: A layout-aware generative language model for multimodal document understanding Paper • 2401.00908 • Published Dec 31, 2023 • 190
Boundary Attention: Learning to Find Faint Boundaries at Any Resolution Paper • 2401.00935 • Published Jan 1, 2024 • 18