@leloykun
The Case for Muon 1) We can descend 'faster' in non-Euclidean spaces 2) Adam/Shampoo/SOAP/etc. dynamically learn the preconditioner and, equivalently, the norm & space to descend in 3) Muon saves a lot of compute by simply letting the norm to vary within a fixed range https://t.co/PKpXrKSYpT