@PavloMolchanov
What if one training run could produce multiple high-quality LLMs for free? 🤔Turns out it can. ❗️We’re releasing Nemotron-Elastic-12B - a single model that gives you 6B/9B/12B variants without extra training cost. ✨ Highlights: - Many-in-one model: Zero-shot slicing gives you 6B, 9B, 12B from the same checkpoint. No retraining. No extra runs. - Constant training cost: Traditional pipelines pay linearly for each size. Elastic cuts this to ~constant — 7.2× token savings for the 6B/9B/12B family. - Constant deployment memory: All variants fit in 24GB (just the 12B footprint). 2.25× reduction vs storing separate checkpoints. - Great reasoning: Hybrid Mamba-2 + Transformer architecture, competitive with same-size models on MATH-500, AIME, GPQA, LCB, etc. - Perfect for edge: Pick the right model size on-device without juggling multiple checkpoints or retraining. Elastic models = less compute, less memory, higher accuracy — and all from a single model. 📖 Read the full technical paper: https://t.co/DrxZCyvvjX 🤗 Explore the model: https://t.co/3PnZudn5PW