@percyliang
GPT-5 and GPT-5 mini added to HELM capabilities v1.12.0. Interestingly, GPT-5 mini tops the leaderboard ahead of GPT-5 because on Omni-MATH, GPT-5 uses more reasoning tokens (and is hard to control) and hits our reasoning token budget of 14096. Doing fair evals is tricky! https://t.co/hSmyQgke4S