info snack
Distillation
Teaching a cheaper model to imitate useful behavior from a stronger one.
companion card
Distillation trains a smaller or cheaper model to imitate useful behavior from a stronger model.
What it means
A smaller model learns from teacher outputs, traces, labels, or preferences. The goal is useful behavior on a defined workload, not general intelligence.
Why product teams care
Distillation can cut cost and latency, but it needs enough examples and a held-out eval to make sure the student learned the task.
Understudy angle
Understudy helps decide when prompt optimization has plateaued and a specialist model has enough evidence to justify training and serving.
take this with you
Distillation is compelling when the task repeats and the teacher behavior is measurable.
Only distill after you can name the workload and score the teacher.