@emollick
I think there is likely too much emphasis on the METR long-task measurement as a sign of AI progress... ... but it doesn't matter. With a little help from GPT-5.2 Pro, I calculated the correlations between log(METR) & other key benchmarks, and they basically all correlate highly https://t.co/diIOBJ8w49