@_NathanCalvin
This work on messy real-world evals from Sayash et al is wild and surprised me (and Sayash isn't known to over-hype). "App store operators should prepare for and police spam submissions, as they might soon see thousands of applications submitted autonomously using agents." https://t.co/owv8WFmsdp