@TheZachMueller
Distributed training unlocks your ability to train your own SOTA models (as long as data is good and you have compute for long enough). We're seeing this today with the open release of SmolLM3: 3B parameters trained on 11 trillion tokens on 384 H100's for 24 days. Come learn… https://t.co/cjiRJ86paI