info snack

Distillation

Teaching a cheaper model to imitate useful behavior from a stronger one.

companion card

Distillation trains a smaller or cheaper model to imitate useful behavior from a stronger model.

What it means

A smaller model learns from teacher outputs, traces, labels, or preferences. The goal is useful behavior on a defined workload, not general intelligence.

Why product teams care

Distillation can cut cost and latency, but it needs enough examples and a held-out eval to make sure the student learned the task.

Understudy angle

Understudy helps decide when prompt optimization has plateaued and a specialist model has enough evidence to justify training and serving.

take this with you

Distillation is compelling when the task repeats and the teacher behavior is measurable.

Only distill after you can name the workload and score the teacher.