@_igorshilov
New Anthropic research! We study how to train models so that high-risk capabilities live in a small, separate set of parameters, allowing clean capability removal when needed ā for example in CBRN or cybersecurity domains. https://t.co/jX7ThUf0SF