@_berlinchen
We cooked up Gram Newton-Schulz: a drop-in replacement of Muon’s Newton-Schulz that is up to 2x faster. Building this requires synthesizing ideas from linear algebra, numerical analysis, and kernel design. This makes for a great story and an even better optimizer! Amazing collaborators: @jcz42, @noahamsel, @tri_dao Blog: https://t.co/EyysR6p4Bs Code: https://t.co/5DjwIg7SAc