@dair_ai
New research from Google. LLMs hallucinate with high confidence, miss their own knowledge boundaries, and misreport uncertainty. Most fixes bolt calibration on from the outside. RLMF turns the model own metacognition into the training signal. It refines completion rankings during preference optimization based on how good the model self-judgments of its performance are, and uses those same self-judgments to select high-value training data. The approach is two-stage. First calibrate the faithfulness of self-reported confidence, then map it to natural linguistic uncertainty through targeted output editing. RLMF reaches state-of-the-art faithful calibration across diverse tasks while preserving accuracy, and surpasses standard RL by up to 63%. Paper: https://t.co/tBzuIYXAmf Learn to build effective AI agents in our academy: https://t.co/LRnpZN7L4c