@HelloSurgeAI
Imagine measuring a personal assistant’s ability by asking them to write emails without using the letter "c" or any commas. Crazy, right? But that’s exactly what many instruction-following benchmarks are designed to test. Here’s a real example: https://t.co/zt1xe7kZt9