MegaTrain: Full Precision Training of 100B+ Parameter Large Language Models on a Single GPU Paper • 2604.05091 • Published 12 days ago • 45
MoT-PL Collection Polish translation of Mixture-of-Thoughts subset and its variants. • 6 items • Updated Jan 14