@rasbt
@_xpn_ Hope you are enjoying it! Re distillation, Chapter 7 on supervised SFT is essentially DeepSeek-style distillation. Coincidentally, I am also currently wrapping up the distillation chapter for the sequel book (Build a reasoning model from scratch). In the meantime, you might like my distillation tools for generating datasets for distillation: https://t.co/PD7s7O9ri8