@forward_future_
FROM THE LIVE SHOW: @MatthewBerman and Hiten Shah (@hnshah) discuss the gap between AGI benchmarks and real user needs. “Everyone’s chasing AGI benchmarks, but most users aren’t thinking about AGI. They’re asking: can I get what I want, whether that’s a good conversation, a feel-good moment, or useful output?” “The SWE-bench team said: if an LLM can do it, we can test it. That means not just math and science benchmarks — the raw intelligence — but also personality, tone, and response style, the less tangible aspects of how people actually use these models.”