@emollick
The Graduate-Level Google-Proof Q&A test (GPQA) is a series of hard multiple-choice problems designed to test advanced knowledge. Non-experts with access to the internet get 34% right, PhDs with internet access get 65-70% inside their specialty. We are probably near saturation