@mishig25
Ran autoresearch on hf to see whether anything can beat MuonAdamW baseline Biggest takeaway: NS orthogonalization is a very strong attractor that absorbs most gradient modifications you throw at it. See all the artifacts at https://t.co/S5DY7MezUp https://t.co/XyIEMeZ4Ft