-
Textbooks Are All You Need
Paper • 2306.11644 • Published • 153 -
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Paper • 2501.12948 • Published • 439 -
Muon is Scalable for LLM Training
Paper • 2502.16982 • Published • 10 -
The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale
Paper • 2406.17557 • Published • 100
Collections
Discover the best community collections!
Collections including paper arxiv:2406.17557
-
Reasoning Introduces New Poisoning Attacks Yet Makes Them More Complicated
Paper • 2509.05739 • Published • 2 -
Loong: Synthesize Long Chain-of-Thoughts at Scale through Verifiers
Paper • 2509.03059 • Published • 25 -
Universal Deep Research: Bring Your Own Model and Strategy
Paper • 2509.00244 • Published • 14 -
<think> So let's replace this phrase with insult... </think> Lessons learned from generation of toxic texts with LLMs
Paper • 2509.08358 • Published • 13
-
FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language
Paper • 2506.20920 • Published • 77 -
SmolVLM: Redefining small and efficient multimodal models
Paper • 2504.05299 • Published • 205 -
YourBench: Easy Custom Evaluation Sets for Everyone
Paper • 2504.01833 • Published • 22 -
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model
Paper • 2502.02737 • Published • 254
-
Navigating Dataset Documentations in AI: A Large-Scale Analysis of Dataset Cards on Hugging Face
Paper • 2401.13822 • Published • 1 -
Attention Is All You Need
Paper • 1706.03762 • Published • 110 -
HuggingFace's Transformers: State-of-the-art Natural Language Processing
Paper • 1910.03771 • Published • 21 -
Model Cards for Model Reporting
Paper • 1810.03993 • Published • 7
-
PDFTriage: Question Answering over Long, Structured Documents
Paper • 2309.08872 • Published • 54 -
Adapting Large Language Models via Reading Comprehension
Paper • 2309.09530 • Published • 81 -
Table-GPT: Table-tuned GPT for Diverse Table Tasks
Paper • 2310.09263 • Published • 40 -
Context-Aware Meta-Learning
Paper • 2310.10971 • Published • 17
-
The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale
Paper • 2406.17557 • Published • 100 -
Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs
Paper • 2406.16860 • Published • 63 -
Arboretum: A Large Multimodal Dataset Enabling AI for Biodiversity
Paper • 2406.17720 • Published • 8 -
Scaling Synthetic Data Creation with 1,000,000,000 Personas
Paper • 2406.20094 • Published • 104
-
Textbooks Are All You Need
Paper • 2306.11644 • Published • 153 -
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Paper • 2501.12948 • Published • 439 -
Muon is Scalable for LLM Training
Paper • 2502.16982 • Published • 10 -
The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale
Paper • 2406.17557 • Published • 100
-
Reasoning Introduces New Poisoning Attacks Yet Makes Them More Complicated
Paper • 2509.05739 • Published • 2 -
Loong: Synthesize Long Chain-of-Thoughts at Scale through Verifiers
Paper • 2509.03059 • Published • 25 -
Universal Deep Research: Bring Your Own Model and Strategy
Paper • 2509.00244 • Published • 14 -
<think> So let's replace this phrase with insult... </think> Lessons learned from generation of toxic texts with LLMs
Paper • 2509.08358 • Published • 13
-
FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language
Paper • 2506.20920 • Published • 77 -
SmolVLM: Redefining small and efficient multimodal models
Paper • 2504.05299 • Published • 205 -
YourBench: Easy Custom Evaluation Sets for Everyone
Paper • 2504.01833 • Published • 22 -
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model
Paper • 2502.02737 • Published • 254
-
Navigating Dataset Documentations in AI: A Large-Scale Analysis of Dataset Cards on Hugging Face
Paper • 2401.13822 • Published • 1 -
Attention Is All You Need
Paper • 1706.03762 • Published • 110 -
HuggingFace's Transformers: State-of-the-art Natural Language Processing
Paper • 1910.03771 • Published • 21 -
Model Cards for Model Reporting
Paper • 1810.03993 • Published • 7
-
PDFTriage: Question Answering over Long, Structured Documents
Paper • 2309.08872 • Published • 54 -
Adapting Large Language Models via Reading Comprehension
Paper • 2309.09530 • Published • 81 -
Table-GPT: Table-tuned GPT for Diverse Table Tasks
Paper • 2310.09263 • Published • 40 -
Context-Aware Meta-Learning
Paper • 2310.10971 • Published • 17
-
The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale
Paper • 2406.17557 • Published • 100 -
Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs
Paper • 2406.16860 • Published • 63 -
Arboretum: A Large Multimodal Dataset Enabling AI for Biodiversity
Paper • 2406.17720 • Published • 8 -
Scaling Synthetic Data Creation with 1,000,000,000 Personas
Paper • 2406.20094 • Published • 104