@OpenAI
Frontier models can recognize when they are being tested, and their tendency to scheme is influenced by this situational awareness. We demonstrated counterfactually that situational awareness in their chain-of-thought affects scheming rates: the more situationally aware a model is, the less it schemes, and vice versa. Moreover, both RL training and anti-scheming training increase levels of situational awareness.