@fchollet
If you haven't read the ARC Prize 2024 technical report, check it out (link in next tweet). One important bit: we'll be releasing a v2 of the benchmark early next year (human testing is currently being finalized). Why? Because AGI progress in 2025 is going to need a better compass than v1. v1 fulfilled its mission well over the past 5 years, but what we've learned from it enables us to ship something better. In 2020, an ensemble of all Kaggle submissions in that year's competition scored 49% -- and that was all crude program enumeration with relatively low compute. This signals that about half of the benchmark was not a strong signal towards AGI. Today, an ensemble of all Kaggle submissions in the 2024 competition is scoring 81%. This signals the benchmark is saturating, and that enough compute / brute force will get you over the finish line. v2 will fix these issues and will increase the "signal strength" of the benchmark.