API Reference

An overview of OpenRouter’s API

OpenRouter’s request and response schemas are very similar to the OpenAI Chat API, with a few small differences. At a high level, OpenRouter normalizes the schema across models and providers so you only need to learn one.

Requests

Request Format

Here is the request schema as a TypeScript type. This will be the body of your POST request to the /api/v1/chat/completions endpoint (see the quick start above for an example).

For a complete list of parameters, see the Parameters.

Request Schema
1// Definitions of subtypes are below
2type Request = {
3 // Either "messages" or "prompt" is required
4 messages?: Message[];
5 prompt?: string;
6
7 // If "model" is unspecified, uses the user's default
8 model?: string; // See "Supported Models" section
9
10 // Allows to force the model to produce specific output format.
11 // See models page and note on this docs page for which models support it.
12 response_format?: { type: 'json_object' };
13
14 stop?: string | string[];
15 stream?: boolean; // Enable streaming
16
17 // See LLM Parameters (openrouter.ai/docs/api-reference/parameters)
18 max_tokens?: number; // Range: [1, context_length)
19 temperature?: number; // Range: [0, 2]
20
21 // Tool calling
22 // Will be passed down as-is for providers implementing OpenAI's interface.
23 // For providers with custom interfaces, we transform and map the properties.
24 // Otherwise, we transform the tools into a YAML template. The model responds with an assistant message.
25 // See models supporting tool calling: openrouter.ai/models?supported_parameters=tools
26 tools?: Tool[];
27 tool_choice?: ToolChoice;
28
29 // Advanced optional parameters
30 seed?: number; // Integer only
31 top_p?: number; // Range: (0, 1]
32 top_k?: number; // Range: [1, Infinity) Not available for OpenAI models
33 frequency_penalty?: number; // Range: [-2, 2]
34 presence_penalty?: number; // Range: [-2, 2]
35 repetition_penalty?: number; // Range: (0, 2]
36 logit_bias?: { [key: number]: number };
37 top_logprobs: number; // Integer only
38 min_p?: number; // Range: [0, 1]
39 top_a?: number; // Range: [0, 1]
40
41 // Reduce latency by providing the model with a predicted output
42 // https://platform.openai.com/docs/guides/latency-optimization#use-predicted-outputs
43 prediction?: { type: 'content'; content: string };
44
45 // OpenRouter-only parameters
46 // See "Prompt Transforms" section: openrouter.ai/docs/transforms
47 transforms?: string[];
48 // See "Model Routing" section: openrouter.ai/docs/model-routing
49 models?: string[];
50 route?: 'fallback';
51 // See "Provider Routing" section: openrouter.ai/docs/provider-routing
52 provider?: ProviderPreferences;
53};
54
55// Subtypes:
56
57type TextContent = {
58 type: 'text';
59 text: string;
60};
61
62type ImageContentPart = {
63 type: 'image_url';
64 image_url: {
65 url: string; // URL or base64 encoded image data
66 detail?: string; // Optional, defaults to "auto"
67 };
68};
69
70type ContentPart = TextContent | ImageContentPart;
71
72type Message =
73 | {
74 role: 'user' | 'assistant' | 'system';
75 // ContentParts are only for the "user" role:
76 content: string | ContentPart[];
77 // If "name" is included, it will be prepended like this
78 // for non-OpenAI models: `{name}: {content}`
79 name?: string;
80 }
81 | {
82 role: 'tool';
83 content: string;
84 tool_call_id: string;
85 name?: string;
86 };
87
88type FunctionDescription = {
89 description?: string;
90 name: string;
91 parameters: object; // JSON Schema object
92};
93
94type Tool = {
95 type: 'function';
96 function: FunctionDescription;
97};
98
99type ToolChoice =
100 | 'none'
101 | 'auto'
102 | {
103 type: 'function';
104 function: {
105 name: string;
106 };
107 };

The response_format parameter ensures you receive a structured response from the LLM. The parameter is only supported by OpenAI models, Nitro models, and some others - check the providers on the model page on openrouter.ai/models to see if it’s supported, and set require_parameters to true in your Provider Preferences. See Provider Routing

Headers

OpenRouter allows you to specify some optional headers to identify your app and make it discoverable to users on our site.

  • HTTP-Referer: Identifies your app on openrouter.ai
  • X-Title: Sets/modifies your app’s title
TypeScript
1fetch('https://openrouter.ai/api/v1/chat/completions', {
2 method: 'POST',
3 headers: {
4 Authorization: 'Bearer <OPENROUTER_API_KEY>',
5 'HTTP-Referer': '<YOUR_SITE_URL>', // Optional. Site URL for rankings on openrouter.ai.
6 'X-Title': '<YOUR_SITE_NAME>', // Optional. Site title for rankings on openrouter.ai.
7 'Content-Type': 'application/json',
8 },
9 body: JSON.stringify({
10 model: 'openai/gpt-4o',
11 messages: [
12 {
13 role: 'user',
14 content: 'What is the meaning of life?',
15 },
16 ],
17 }),
18});

Model routing: If the model parameter is omitted, the user or payer’s default is used. Otherwise, remember to select a value for model from the supported models or API, and include the organization prefix. OpenRouter will select the least expensive and best GPUs available to serve the request, and fall back to other providers or GPUs if it receives a 5xx response code or if you are rate-limited.

Streaming: Server-Sent Events (SSE) are supported as well, to enable streaming for all models. Simply send stream: true in your request body. The SSE stream will occasionally contain a “comment” payload, which you should ignore (noted below).

Non-standard parameters: If the chosen model doesn’t support a request parameter (such as logit_bias in non-OpenAI models, or top_k for OpenAI), then the parameter is ignored. The rest are forwarded to the underlying model API.

Assistant Prefill: OpenRouter supports asking models to complete a partial response. This can be useful for guiding models to respond in a certain way.

To use this features, simply include a message with role: "assistant" at the end of your messages array.

TypeScript
1fetch('https://openrouter.ai/api/v1/chat/completions', {
2 method: 'POST',
3 headers: {
4 Authorization: 'Bearer <OPENROUTER_API_KEY>',
5 'Content-Type': 'application/json',
6 },
7 body: JSON.stringify({
8 model: 'openai/gpt-4o',
9 messages: [
10 { role: 'user', content: 'What is the meaning of life?' },
11 { role: 'assistant', content: "I'm not sure, but my best guess is" },
12 ],
13 }),
14});

Images & Multimodal

Multimodal requests are only available via the /api/v1/chat/completions API with a multi-part messages parameter. The image_url can either be a URL or a data-base64 encoded image.

1"messages": [
2 {
3 "role": "user",
4 "content": [
5 {
6 "type": "text",
7 "text": "What's in this image?"
8 },
9 {
10 "type": "image_url",
11 "image_url": {
12 "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
13 }
14 }
15 ]
16 }
17]

Sample LLM’s response:

1{
2 "choices": [
3 {
4 "role": "assistant",
5 "content": "This image depicts a scenic natural landscape featuring a long wooden boardwalk that stretches out through an expansive field of green grass. The boardwalk provides a clear path and invites exploration through the lush environment. The scene is surrounded by a variety of shrubbery and trees in the background, indicating a diverse plant life in the area."
6 }
7 ]
8}

Uploading base64 encoded images

For locally stored images, you can send them to the model using base64 encoding. Here’s an example:

1import { readFile } from "fs/promises";
2
3const getFlowerImage = async (): Promise<string> => {
4 const imagePath = new URL("flower.jpg", import.meta.url);
5 const imageBuffer = await readFile(imagePath);
6 const base64Image = imageBuffer.toString("base64");
7 return `data:image/jpeg;base64,${base64Image}`;
8};
9
10...
11
12"messages": [
13 {
14 role: "user",
15 content: [
16 {
17 type: "text",
18 text: "What's in this image?",
19 },
20 {
21 type: "image_url",
22 image_url: {
23 url: `${await getFlowerImage()}`,
24 },
25 },
26 ],
27 },
28];

When sending data-base64 string, ensure it contains the content-type of the image. Example:



Supported content types are:

  • image/png
  • image/jpeg
  • image/webp

Tool Calls

Tool calls (also known as function calling) allow you to give an LLM access to external tools. The LLM does not call the tools directly. Instead, it suggests the tool to call. The user then calls the tool separately and provides the results back to the LLM. Finally, the LLM formats the response into an answer to the user’s original question.

An example of the five-turn sequence:

  1. The user asks a question, while supplying a list of available tools in a JSON schema format:
1...
2"messages": [{
3 "role": "user",
4 "content": "What is the weather like in Boston?"
5}],
6"tools": [{
7"type": "function",
8"function": {
9 "name": "get_current_weather",
10 "description": "Get the current weather in a given location",
11 "parameters": {
12 "type": "object",
13 "properties": {
14 "location": {
15 "type": "string",
16 "description": "The city and state, e.g. San Francisco, CA"
17 },
18 "unit": {
19 "type": "string",
20 "enum": [
21 "celsius",
22 "fahrenheit"
23 ]
24 }
25 },
26 "required": [
27 "location"
28 ]
29 }
30 }
31}]
  1. The LLM responds with tool suggestion, together with appropriate arguments:
1// Some models might include their reasoning in content
2"message": {
3 "role": "assistant",
4 "content": null,
5 "tool_calls": [
6 {
7 "id": "call_9pw1qnYScqvGrCH58HWCvFH6",
8 "type": "function",
9 "function": {
10 "name": "get_current_weather",
11 "arguments": "{ \"location\": \"Boston, MA\"}"
12 }
13 }
14 ]
15},
  1. The user calls the tool separately:
1const weather = await getWeather({ location: 'Boston, MA' });
2console.log(weather); // { "temperature": "22", "unit": "celsius", "description": "Sunny"}
  1. The user provides the tool results back to the LLM:
1...
2"messages": [
3 {
4 "role": "user",
5 "content": "What is the weather like in Boston?"
6 },
7 {
8 "role": "assistant",
9 "content": null,
10 "tool_calls": [
11 {
12 "id": "call_9pw1qnYScqvGrCH58HWCvFH6",
13 "type": "function",
14 "function": {
15 "name": "get_current_weather",
16 "arguments": "{ \"location\": \"Boston, MA\"}"
17 }
18 }
19 ]
20 },
21 {
22 "role": "tool",
23 "name": "get_current_weather",
24 "tool_call_id": "call_9pw1qnYScqvGrCH58HWCvFH6",
25 "content": "{\"temperature\": \"22\", \"unit\": \"celsius\", \"description\": \"Sunny\"}"
26 }
27]
  1. The LLM formats the tool result into a natural language response:
1...
2"message": {
3 "role": "assistant",
4 "content": "The current weather in Boston, MA is sunny with a temperature of 22°C."
5}

OpenRouter standardizes the tool calling interface. However, different providers and models may support less tool calling features and arguments. (ex: tool_choice, tool_use, tool_result)

Responses

Response Format

At a high level, OpenRouter normalizes the schema across models and providers to comply with the OpenAI Chat API.

This means that choices is always an array, even if the model only returns one completion. Each choice will contain a delta property if a stream was requested and a message property otherwise. This makes it easier to use the same code for all models.

Here’s the response schema as a TypeScript type:

1// Definitions of subtypes are below
2type Response = {
3 id: string;
4 // Depending on whether you set "stream" to "true" and
5 // whether you passed in "messages" or a "prompt", you
6 // will get a different output shape
7 choices: (NonStreamingChoice | StreamingChoice | NonChatChoice)[];
8 created: number; // Unix timestamp
9 model: string;
10 object: 'chat.completion' | 'chat.completion.chunk';
11
12 system_fingerprint?: string; // Only present if the provider supports it
13
14 // Usage data is always returned for non-streaming.
15 // When streaming, you will get one usage object at
16 // the end accompanied by an empty choices array.
17 usage?: ResponseUsage;
18};
1// If the provider returns usage, we pass it down
2// as-is. Otherwise, we count using the GPT-4 tokenizer.
3
4type ResponseUsage = {
5 /** Including images and tools if any */
6 prompt_tokens: number;
7 /** The tokens generated */
8 completion_tokens: number;
9 /** Sum of the above two fields */
10 total_tokens: number;
11};
1// Subtypes:
2type NonChatChoice = {
3 finish_reason: string | null;
4 text: string;
5 error?: ErrorResponse;
6};
7
8type NonStreamingChoice = {
9 finish_reason: string | null;
10 native_finish_reason: string | null;
11 message: {
12 content: string | null;
13 role: string;
14 tool_calls?: ToolCall[];
15 };
16 error?: ErrorResponse;
17};
18
19type StreamingChoice = {
20 finish_reason: string | null;
21 native_finish_reason: string | null;
22 delta: {
23 content: string | null;
24 role?: string;
25 tool_calls?: ToolCall[];
26 };
27 error?: ErrorResponse;
28};
29
30type ErrorResponse = {
31 code: number; // See "Error Handling" section
32 message: string;
33 metadata?: Record<string, unknown>; // Contains additional error information such as provider details, the raw error message, etc.
34};
35
36type ToolCall = {
37 id: string;
38 type: 'function';
39 function: FunctionCall;
40};

Here’s an example:

1{
2 "id": "gen-xxxxxxxxxxxxxx",
3 "choices": [
4 {
5 "finish_reason": "stop", // Normalized finish_reason
6 "native_finish_reason": "stop", // The raw finish_reason from the provider
7 "message": {
8 // will be "delta" if streaming
9 "role": "assistant",
10 "content": "Hello there!"
11 }
12 }
13 ],
14 "usage": {
15 "prompt_tokens": 0,
16 "completion_tokens": 4,
17 "total_tokens": 4
18 },
19 "model": "openai/gpt-3.5-turbo" // Could also be "anthropic/claude-2.1", etc, depending on the "model" that ends up being used
20}

Finish Reason

OpenRouter also normalizes finish_reason to one of the following values: ‘tool_calls’, ‘stop’, ‘length’, ‘content_filter’, ‘error’.

Some models and providers may have additional stop reasons. The raw finish_reason string returned by the model is available via the native_finish_reason property.

Querying Cost and Stats

The token counts that are returned in the completions API response are NOT counted with the model’s native tokenizer. Instead it uses a normalized, model-agnostic count.

For precise token accounting using the model’s native tokenizer, use the /api/v1/generation endpoint. Your credits are deducted based on the native token counts (not the ‘normalized’ token counts returned in the API response).

You can use the returned id to query for the generation stats (including token counts and cost) after the request is complete. This is how you can get the cost and tokens for all models and requests, streaming and non-streaming.

Query Generation Stats
1const generation = await fetch(
2 'https://openrouter.ai/api/v1/generation?id=$GENERATION_ID',
3 { headers },
4);
5
6const stats = await generation.json();
7// Example response: {
8// data: {
9// "id": "gen-nNPYi0ZB6GOK5TNCUMHJGgXo",
10// "model": "openai/gpt-4-32k",
11// "streamed": false,
12// "generation_time": 2,
13// "tokens_prompt": 24,
14// "tokens_completion": 29,
15// "total_cost": 0.00492,
16// // ... additional stats
17// }
18// }

Note that token counts are also available in the usage field of the response body for non-streaming completions.

Streaming Support

For SSE (Server-Sent Events) streams, OpenRouter occasionally sends comments to prevent connection timeouts. These comments look like:

: OPENROUTER PROCESSING

Comment payload can be safely ignored per the SSE specs. However, you can leverage it to improve UX as needed, e.g. by showing a dynamic loading indicator.

Some SSE client implementations might not parse the payload according to spec, which leads to an uncaught error when you JSON.stringify the non-JSON payloads. We recommend the following clients:

Was this page helpful?
Built with