@iScienceLuvr
Compute as Teacher: Turning Inference Compute Into Reference-Free Supervision "This paper asks a simple question: Can inference compute substitute for missing supervision?" "the current policy produces a group of rollouts; a frozen anchor (the initial policy) reconciles omissions and contradictions to estimate a reference, turning extra inference-time compute into a teacher signal." "With training, CaT-RL delivers up to 33% relative improvement on MATH-500 and 30% on HealthBench with Llama 3.1 8B, and large gains across two other model families without human annotations"