@ben_burtenshaw
before model distillation was an attack vector. it was. pretty handy way of improving model performance on a task you care about. especially if you want to take small, local, or cheap model and improve it on a tasks typically reserved to large models. in the next live stream, we're going to break down knowledge distillation in post-training and show you how to implement it. going out next week: July 7th 8am PST, 5pm CEST live on: @huggingface X, YT, LI