@Muennighoff
How to keep scaling Large Language Models when data runs out? 🎢 We train 400 models with up to 9B params & 900B tokens to create an extension of Chinchilla scaling laws for repeated data. Results are interesting… 🧐 📜: https://t.co/586bWwvpba 1/7 https://t.co/eTqX1reaey