@_lewtun
We've rebuilt TRL's on-policy distillation trainer from the ground up to: š³ support huge teachers with 100B+ params ā”ļø train >40x faster thanks to some nifty buffer and payload optimisations This means you can now distill models in the Llama, Qwen and Gemma families across any scale! Technical deep dive with all the optimisations and pretty animations ā¬ļø