Prompt Caching

To save on inference costs, you can enable prompt caching on supported providers and models. Note that prompt caching does not work when switching between providers. In order to cache the prompt, LLM engines must store a memory snapshot of the processed prompt, which is not shared with other providers.

The prompt caching request syntax is provider-specific at this time. As more providers support prompt caching, we will explore normalizing them into a consistent request format.

Anthropic Claude

Caching price changes:

  • Cache writes: charged at 1.25x the price of the original input pricing

  • Cache reads: charged at 0.1x the price of the original input pricing

Supported models:

  • anthropic/claude-3.5-sonnet
  • anthropic/claude-3-haiku
  • anthropic/claude-3-opus

Prompt caching with Anthropic requires the use of cache_control breakpoints. There is a limit of four breakpoints, and the cache will expire within five minutes. Therefore, it is recommended to reserve the cache breakpoints for large bodies of text, such as character cards, CSV data, RAG data, book chapters, etc.

Click here to read more about Anthropic prompt caching and its limitation.

The cache_control breakpoint can only be inserted into the text part of a multipart message.

System message caching example:

{
  "messages": [
    {
      "role": "system",
      "content": [
        {
          "type": "text",
          "text": "You are a historian studying the fall of the Roman Empire. You know the following book very well:"
        },
        {
          "type": "text",
          "text": "HUGE TEXT BODY",
          "cache_control": {
            "type": "ephemeral"
          }
        }
      ]
    },
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "What triggered the collapse?"
        }
      ]
    }
  ]
}

User message caching example:

{
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "Given the book below:"
        },
        {
          "type": "text",
          "text": "HUGE TEXT BODY",
          "cache_control": {
            "type": "ephemeral"
          }
        },
        {
          "type": "text",
          "text": "Name all the characters in the above book"
        }
      ]
    }
  ]
}

DeepSeek

Caching price changes:

  • Cache writes: charged at the same price as the original input pricing

  • Cache reads: charged at 0.1x the price of the original input pricing

Prompt caching with DeepSeek is automated and does not require any additional configuration.