API Reference
An overview of OpenRouter’s API
OpenRouter’s request and response schemas are very similar to the OpenAI Chat API, with a few small differences. At a high level, OpenRouter normalizes the schema across models and providers so you only need to learn one.
Requests
Request Format
Here is the request schema as a TypeScript type. This will be the body of your POST
request to the /api/v1/chat/completions
endpoint (see the quick start above for an example).
For a complete list of parameters, see the Parameters.
The response_format
parameter ensures you receive a structured response from the LLM. The parameter is only supported by OpenAI models, Nitro models, and some others - check the providers on the model page on openrouter.ai/models to see if it’s supported, and set require_parameters
to true in your Provider Preferences. See Provider Routing
Headers
OpenRouter allows you to specify some optional headers to identify your app and make it discoverable to users on our site.
HTTP-Referer
: Identifies your app on openrouter.aiX-Title
: Sets/modifies your app’s title
Model routing: If the model
parameter is omitted, the user or payer’s default is used. Otherwise, remember to select a value for model
from the supported models or API, and include the organization prefix. OpenRouter will select the least expensive and best GPUs available to serve the request, and fall back to other providers or GPUs if it receives a 5xx response code or if you are rate-limited.
Streaming: Server-Sent Events (SSE) are supported as well, to enable streaming for all models. Simply send stream: true
in your request body. The SSE stream will occasionally contain a “comment” payload, which you should ignore (noted below).
Non-standard parameters: If the chosen model doesn’t support a request parameter (such as logit_bias
in non-OpenAI models, or top_k
for OpenAI), then the parameter is ignored. The rest are forwarded to the underlying model API.
Assistant Prefill: OpenRouter supports asking models to complete a partial response. This can be useful for guiding models to respond in a certain way.
To use this features, simply include a message with role: "assistant"
at the end of your messages
array.
Images & Multimodal
Multimodal requests are only available via the /api/v1/chat/completions
API with a multi-part messages
parameter. The image_url
can either be a URL or a data-base64 encoded image.
Sample LLM’s response:
Uploading base64 encoded images
For locally stored images, you can send them to the model using base64 encoding. Here’s an example:
When sending data-base64 string, ensure it contains the content-type of the image. Example:
Supported content types are:
image/png
image/jpeg
image/webp
Tool Calls
Tool calls (also known as function calling) allow you to give an LLM access to external tools. The LLM does not call the tools directly. Instead, it suggests the tool to call. The user then calls the tool separately and provides the results back to the LLM. Finally, the LLM formats the response into an answer to the user’s original question.
An example of the five-turn sequence:
- The user asks a question, while supplying a list of available
tools
in a JSON schema format:
- The LLM responds with tool suggestion, together with appropriate arguments:
- The user calls the tool separately:
- The user provides the tool results back to the LLM:
- The LLM formats the tool result into a natural language response:
OpenRouter standardizes the tool calling interface. However, different providers and models may support less tool calling features and arguments. (ex: tool_choice
, tool_use
, tool_result
)
Responses
Response Format
At a high level, OpenRouter normalizes the schema across models and providers to comply with the OpenAI Chat API.
This means that choices
is always an array, even if the model only returns one completion. Each choice will contain a delta
property if a stream was requested and a message
property otherwise. This makes it easier to use the same code for all models.
Here’s the response schema as a TypeScript type:
Here’s an example:
Finish Reason
OpenRouter also normalizes finish_reason
to one of the following values: ‘tool_calls’, ‘stop’, ‘length’, ‘content_filter’, ‘error’.
Some models and providers may have additional stop reasons. The raw finish_reason string returned by the model is available via the native_finish_reason
property.
Querying Cost and Stats
The token counts that are returned in the completions API response are NOT counted with the model’s native tokenizer. Instead it uses a normalized, model-agnostic count.
For precise token accounting using the model’s native tokenizer, use the /api/v1/generation
endpoint. Your credits are deducted based on the native token counts (not the ‘normalized’ token counts returned in the API response).
You can use the returned id
to query for the generation stats (including token counts and cost) after the request is complete. This is how you can get the cost and tokens for all models and requests, streaming and non-streaming.
Note that token counts are also available in the usage
field of the response body for non-streaming completions.
Streaming Support
For SSE (Server-Sent Events) streams, OpenRouter occasionally sends comments to prevent connection timeouts. These comments look like:
Comment payload can be safely ignored per the SSE specs. However, you can leverage it to improve UX as needed, e.g. by showing a dynamic loading indicator.
Some SSE client implementations might not parse the payload according to spec, which leads to an uncaught error when you JSON.stringify
the non-JSON payloads. We recommend the following clients: