@janleike
Interesting trend: models have been getting a lot more aligned over the course of 2025. The fraction of misaligned behavior found by automated auditing has been going down not just at Anthropic but for GDM and OpenAI as well. https://t.co/8DYm9SP7wF