@PyTorch
MXFP8 training for MoEs on GB200s enables a 1.3x speedup with equivalent convergence versus BF16. This #PyTorch update via TorchAO and TorchTitan on Crusoe Cloud details gains from dynamically quantized grouped GEMMs. #AI #OpenSource π https://t.co/CifP9fQBEC βοΈ @vega_myhre, @_xmfan, @drisspg, Chinmay Baikar