@AnthropicAI
Dictionary learning works! Using a "sparse autoencoder", we can extract features that represent purer concepts than neurons do. For example, turning ~500 neurons into ~4000 features uncovers things like DNA sequences, HTTP requests, and legal text. 📄https://t.co/XQvzENHMrp https://t.co/wCZl7NKxc5