@emollick
So, Claude 4.5 came in far above trend in the much-watched METR measure of the task duration that AI can accomplish autonomously at 4 hours 49 minutes. Interestingly, at the harder 80% success threshold, it is GPT-5.1 Codex Max that breaks the trend. In 2023, GPT-4 was a minute. https://t.co/kRulXyNEEI