@emollick
One thing to pay attention to in benchmarking AI is how success is being measured. Models can be very fragile, getting the right answer rarely, but measurably more than chance, and look very good on benchmarks using PASS@10, but fail often in reality. https://t.co/ifO6cKwyt2 https://t.co/IshWWsXkGp