@PyTorch
Training massive Mixture-of-Experts (MoE) models like DeepSeek-V3 and Llama 4-Scout efficiently is one of the challenges in modern AI. These models push GPUs, networks, and compilers to their limits. To tackle this, AMD and Meta’s PyTorch teams joined forces to tune TorchTitan and Primus-Turbo, AMD’s open source kernel library, for the new Instinct MI325X GPUs. Together, they reached near-ideal scaling across 1,024 GPUs, showing that efficiency and scale don’t have to be a trade-off. 📎 Read our latest blog: https://t.co/xcpdQpy8da #PyTorchFoundation #OpenSourceAI #TorchTitan #MoE