@AnthropicAI
When language models “reason out loud,” it’s hard to know if their stated reasoning is faithful to the process the model actually used to make its prediction. In two new papers, we measure and improve the faithfulness of language models’ stated reasoning. https://t.co/eumrl2gxk1