Anthropic's prompt-caching pattern for long-running agents

Stableby @julian2026-05-15

[claude-api][caching][performance][cost]

Copies

—

loading…

Views

—

loading…

Spice

API + code chops required.

paste into Claude, or drop the file into a Claude Code skills dir

What it does

A drop-in pattern for using Anthropic's prompt caching so long-running agents (tool loops, doc QA, code review bots) stop re-paying for the same context every turn.

When to use

The same large system prompt or document is sent on every turn
An agent runs > 5 turns and re-sends prior context
API spend is being pulled up by repeat-reads of the same files
Latency budget matters

The skill

Mark the static, reusable parts of your prompt with cache_control: { type: "ephemeral" }. The first request creates the cache (5-minute TTL by default). Subsequent requests within the window hit it and are billed at the discounted cached-read rate.

Rule of thumb at Planes:

System prompt: always cache if it's > 1k tokens.
Large documents (transcripts, design specs, code files) you'll reference more than once in a session: cache them.
Tool definitions: cache if you have more than ~3 tools or rich JSON schemas.
The user's current turn: never cache. It changes every time.

Place cache_control at the last block of the cacheable prefix, not on every block. The cache key is the full prefix up to that marker.

messages = [
    {
        "role": "system",
        "content": [
            {"type": "text", "text": LARGE_SYSTEM_PROMPT},
            {"type": "text", "text": PROJECT_CONTEXT,
             "cache_control": {"type": "ephemeral"}},
        ],
    },
    {"role": "user", "content": user_turn},
]

Watch the response usage object for cache_creation_input_tokens (cache miss, you paid full freight) and cache_read_input_tokens (cache hit, you paid the discount). If the hit rate is low, your prefix isn't stable. Something dynamic snuck in.

Notes

Cache TTL is 5 minutes. If your sessions span longer gaps, the cache will rebuild. That's fine and often still worth it. Don't bother with caching for one-shot prompts. The overhead isn't recouped. This is the single highest-leverage change on most agent loops we ship at Planes.

Comments & reactions

Comments arrive once we point Giscus at the repo. Threads in GitHub Discussions. Reactions count as votes. No new logins.