@kellerjordan0
It's a new day, and here's a new NanoGPT speedrun record: 3.28 FineWeb val loss in 8.2 minutes on 8xH100 Previous record: 10.8 minutes Changelog: - architectural shortcuts - momentum warmup - tanh logit capping By @Grad62304977 and myself 1/6 https://t.co/YAFcuLrLou