info snack
Prompt caching
Reusing computation for repeated prompt prefixes.
companion card
Prompt caching reuses computation for repeated prefix tokens, lowering latency and cost when many calls share the same context.
What it means
If many calls reuse the same system prompt, policy, examples, or documents, the serving system can cache earlier computation instead of recomputing it every time.
Why product teams care
Prompt caching can improve TTFT and cost, but only if the prompt is structured so stable prefix tokens stay stable across requests.
Understudy angle
Understudy can identify repeated prompt prefixes and help agents structure workloads to take advantage of caching.
take this with you
Caching works best when the stable prefix is large and many requests share it.
Move stable instructions before per-request user content and measure cache hit rate.