@percyliang
Assignment 5 (alignment): implement supervised fine-tuning, expert iteration, GRPO and variants, run RL on Qwen 2.5 Math 1.5B to improve MATH because itβs 2025. We thought about having students implement inference, but decided (probably wisely) to let people use vllm instead. https://t.co/mQOG46z2Eh