understudy / universitysnacks
info snack

Time to first token

How long the user waits before streaming begins.

companion card

Time to first token is how long the user waits before streaming begins; retrieval, queueing, and model startup all affect it.

What it means

Time to first token measures the delay between request start and the first streamed output token. It includes work before generation begins.

Why product teams care

Retrieval, queueing, cold starts, long prompts, model size, and safety checks can all increase TTFT. Streaming hides some latency, but not this first wait.

Understudy angle

Understudy can track TTFT by workload so routing choices account for perceived responsiveness, not only final quality.

take this with you

TTFT controls whether a system feels responsive before the final answer is ready.

Measure TTFT separately from total response time for one agent workflow.