Service Tiers

Control cost and latency tradeoffs with service tier selection

Service Tiers

The service_tier parameter lets you control cost and latency tradeoffs when sending requests through OpenRouter. You can pass it in your request to select a specific processing tier, and the response will indicate which tier was actually used.

Supported Providers

OpenAI

Accepted request values: auto, default, flex, priority

Learn more in OpenAI’s Chat Completions and Responses API documentation. See OpenAI’s pricing page for details on cost differences between tiers.

API Response Differences

The API response includes a service_tier field that indicates which capacity tier was actually used to serve your request. The placement of this field varies by API format:

  • Chat Completions API (/api/v1/chat/completions): service_tier is returned at the top level of the response object, matching OpenAI’s native format.
  • Responses API (/api/v1/responses): service_tier is returned at the top level of the response object, matching OpenAI’s native format.
  • Messages API (/api/v1/messages): service_tier is returned inside the usage object, matching Anthropic’s native format.