@mntruell
@jasonkneen We A/B test many parts of Cursor: model checkpoints, UX, and the agent harness. In this case, we tested less than 1% of traffic to compare how Claude behaves with the CC harness versus our default harness (something we do often with offline evals). Our team does lots to improve the speed, feel, and accuracy of our harness on the queries users ask and care about. Hope to share more about this work soon.