info snack

Latency

The time between request and useful output.

companion card

Latency is the time from request to useful output; good AI systems optimize both quality and wait time.

What it means

Latency includes queueing, retrieval, prompt processing, generation, tools, retries, streaming, and the time until the user receives something useful.

Why product teams care

A route that is cheaper or smarter still has to satisfy the workflow's speed requirement. Users feel latency as product quality.

Understudy angle

Understudy treats latency alongside quality and cost so route choices match the real workflow, not just a benchmark score.

take this with you

Latency is product behavior, not just infrastructure trivia.

Measure latency by workload, not just as one average across all calls.