@s_batzoglou
@ThinkDi92468945 5.5 pro is way too expensive. Per problem, with 128k tokens and an estimated 2 retries on average to get an answer (that’s conservative by my experience of how these models behave) I expect a cost of >$40 per problem. I don’t want to spend $2000+ to get results for GPT 5.5 pro just on this tiny benchmark because I am running tens of benchmarks on tens of models and don’t have a huge budget.