info snack

Prompt caching

Reusing computation for repeated prompt prefixes.

companion card

Prompt caching reuses computation for repeated prefix tokens, lowering latency and cost when many calls share the same context.

What it means

If many calls reuse the same system prompt, policy, examples, or documents, the serving system can cache earlier computation instead of recomputing it every time.

Why product teams care

Prompt caching can improve TTFT and cost, but only if the prompt is structured so stable prefix tokens stay stable across requests.

Understudy angle

Understudy can identify repeated prompt prefixes and help agents structure workloads to take advantage of caching.

take this with you

Caching works best when the stable prefix is large and many requests share it.

Move stable instructions before per-request user content and measure cache hit rate.