@weights_biases
Impressive work from @wen_kaiyue, @dlwh, @tengyuma & @percyliang! They did over 4.6k wandb runs to fairly benchmark 10 optimizers across 0.1B-1.2B. Matrix-based (Muon/Soap) lead, but gains shrink with scale. Check out their wandb workspace here: https://t.co/hn1WsqfGkS