@HelloSurgeAI
This week, @echen joined @l2k on Gradient Dissent to talk about what's actually happening in post-training right now. Topics include the negative incentives introduced by some benchmarks, early bets on RLHF, and new RL environments the Surge team is building to navigate complex failures. A key insight was the need from frontier models for much deeper human expertise - from PHD level STEM work, to Olympiad level math problems, to tasks that involve days or weeks to complete. https://t.co/Ui7936KsgA