Anthropic's prompt-caching pattern for long-running agents
What it does
A drop-in pattern for using Anthropic's prompt caching so long-running agents (tool loops, doc QA, code review bots) stop re-paying for the same context every turn.
When to use
- The same large system prompt or document is sent on every turn
- An agent runs > 5 turns and re-sends prior context
- API spend is being pulled up by repeat-reads of the same files
- Latency budget matters
The skill
Mark the static, reusable parts of your prompt with cache_control: { type: "ephemeral" }. The first request creates the cache (5-minute TTL by default). Subsequent requests within the window hit it and are billed at the discounted cached-read rate.
Rule of thumb at Planes:
- System prompt: always cache if it's > 1k tokens.
- Large documents (transcripts, design specs, code files) you'll reference more than once in a session: cache them.
- Tool definitions: cache if you have more than ~3 tools or rich JSON schemas.
- The user's current turn: never cache. It changes every time.
Place cache_control at the last block of the cacheable prefix, not on every block. The cache key is the full prefix up to that marker.
messages = [
{
"role": "system",
"content": [
{"type": "text", "text": LARGE_SYSTEM_PROMPT},
{"type": "text", "text": PROJECT_CONTEXT,
"cache_control": {"type": "ephemeral"}},
],
},
{"role": "user", "content": user_turn},
]
Watch the response usage object for cache_creation_input_tokens (cache miss, you paid full freight) and cache_read_input_tokens (cache hit, you paid the discount). If the hit rate is low, your prefix isn't stable. Something dynamic snuck in.
Notes
Cache TTL is 5 minutes. If your sessions span longer gaps, the cache will rebuild. That's fine and often still worth it. Don't bother with caching for one-shot prompts. The overhead isn't recouped. This is the single highest-leverage change on most agent loops we ship at Planes.
Comments arrive once we point Giscus at the repo. Threads in GitHub Discussions. Reactions count as votes. No new logins.