info snack
Latency
The time between request and useful output.
companion card
Latency is the time from request to useful output; good AI systems optimize both quality and wait time.
What it means
Latency includes queueing, retrieval, prompt processing, generation, tools, retries, streaming, and the time until the user receives something useful.
Why product teams care
A route that is cheaper or smarter still has to satisfy the workflow's speed requirement. Users feel latency as product quality.
Understudy angle
Understudy treats latency alongside quality and cost so route choices match the real workflow, not just a benchmark score.
take this with you
Latency is product behavior, not just infrastructure trivia.
Measure latency by workload, not just as one average across all calls.