@emollick
Since OpenAI didn't update Figure 7 from GDPval given the success rate of GPT-5.2 on long-form tasks, I used GPT-5.2 Pro to do so. The chart assumes the process is: delegate long tasks to AI, evaluate the output for an hour, then decide to try again or give up & do it yourself. https://t.co/vFtMZrturL