Response Caching: Zero Cost for Identical Requests

Brian Thomas · 4/30/2026

You can now add X-OpenRouter-Cache: true to your chat completions, responses, messages, or embeddings requests to start caching identical calls. The first call hits the provider and gets billed normally. Every identical call after that returns the same response in a tiny fraction of the time, with zero tokens billed.

View the response caching docs(opens in new tab)

What it does

Response caching sits in front of the model provider. When you send a request with caching enabled, OpenRouter hashes the request body, model, API key, and streaming mode into a cache key. If an identical request was made before and hasn't expired, the cached response comes back immediately. No provider call, no token consumption, no charge.

Both streaming and non-streaming requests work. Cached streaming responses replay through the same pipeline, so your client code doesn't need to change. Text, images, audio, documents, and tool calls all cache normally. Multimodal inputs (base64 images, audio clips, file attachments) are included in the cache key hash. One caveat: very large multimodal payloads that get offloaded internally for processing aren't eligible for caching. Standard-sized requests cache fine.

Response caching is separate from prompt caching. Prompt caching (which many providers offer natively) reduces the cost of the prompt portion when messages share a common prefix. Response caching skips the provider entirely and returns the full response from OpenRouter's edge cache.

Reduces response times from seconds to milliseconds

Cached responses come back in 80-300ms, most of which is serialization and network. The cache lookup itself averages 4ms. For comparison, a typical uncached request to Gemini 2.5 Flash takes about 1.3 seconds, Kimi K2.6 takes 4.6 seconds, and GPT-5.5 takes 9.1 seconds. Cache hits are billed at zero: no prompt tokens, no completion tokens, no charge.

Enable it with a request header or with presets

Add the X-OpenRouter-Cache: true header to each API call you want to be eligible:

Presets. Enable caching for all requests using a specific preset by setting cache_enabled: true in the preset config(opens in new tab). No header needed on individual requests.

You can control how long responses stay cached with X-OpenRouter-Cache-TTL (1 second to 24 hours, default 5 minutes). Need a fresh response? Send X-OpenRouter-Cache-Clear: true to bust the cache for that specific request.

Response headers tell you what happened: X-OpenRouter-Cache-Status: HIT or MISS, plus X-OpenRouter-Cache-Age and X-OpenRouter-Cache-TTL so you can see exactly how the cache is performing.

Where it helps most

Agent retries. When an agent workflow fails partway through, you can retry from the top. Cached steps return instantly and for free, so you only pay for the new work.

Test suites. Run your LLM-backed tests repeatedly without burning tokens. After the first run populates the cache, subsequent runs are deterministic and free.

Repeated context processing. If your app sends the same prompt to the same model (same system prompt, same user input, same parameters), only the first call costs anything.

Available now across most generation endpoints

The cache is scoped to your API key. Different keys (even under the same account) don't share cache entries.

The feature works across /chat/completions, /responses, /messages, and /embeddings. Other endpoints — legacy /completions, /audio/speech (TTS), /audio/transcriptions (STT), /rerank, and video generation — are not yet supported. It's currently in beta, and we're watching how it performs before locking down the API surface.

Cache hits don't count toward provider rate limits (since the request never reaches the provider), and they're visible in your Activity log(opens in new tab) with a cache indicator for easy monitoring.

Full details in the docs(opens in new tab).