Docs
Create sustainable, model-agnostic AI apps.
Sustainable. Users can pay for what they use, instead of developers.
Model-agnostic. Let users choose their own model, or let OpenRouter route for them. Pick between OpenAI, Anthropic, and more. User preferences are portable between models and apps.
Flexible auth. Use OAuth PKCE, API keys, or the Window AI extension.
Supported Models
Trying to configure your default model? Go to Settings.
Model usage can be paid by users, developers, or both, and may shift in availability. You can also fetch models, prices, and limits via API.If you have your own model that you'd like to add to this list and monetize, click here.
Text models
Model Name & ID | Prompt cost (per 1k tokens) | Completion cost (per 1k tokens) | Context (tokens) | Moderation Whether content filtering is applied by OpenRouter, per the model provider's Terms of Service. Developers should adhere to the terms of the model regardless. |
---|---|---|---|---|
OpenAI: GPT-3.5 Turboopenai/gpt-3.5-turbo | $0.0015 | $0.002 | 4,095 | Filtered |
OpenAI: GPT-3.5 Turbo 16kopenai/gpt-3.5-turbo-16k | $0.003 | $0.004 | 16,383 | Filtered |
OpenAI: GPT-4openai/gpt-4 | $0.03 | $0.06 | 8,191 | Filtered |
OpenAI: GPT-4 32kopenai/gpt-4-32k | $0.06 | $0.12 | 32,767 | Filtered |
OpenAI: GPT-3.5 Turbo Instructopenai/gpt-3.5-turbo-instruct | $0.0015 | $0.002 | 4,095 | Filtered |
Anthropic: Claude v2anthropic/claude-2 | $0.01102 | $0.03268 | 100,000 | Filtered |
Anthropic: Claude Instant v1anthropic/claude-instant-v1 | $0.00163 | $0.00551 | 100,000 | Filtered |
Google: PaLM 2 Bisongoogle/palm-2-chat-bison | $0.0005 | $0.0005 | 8,000 | Unfiltered |
Google: PaLM 2 Bison (Code Chat)google/palm-2-codechat-bison | $0.0005 | $0.0005 | 8,000 | Unfiltered |
Meta: Llama v2 13B Chat (beta)meta-llama/llama-2-13b-chat | $0.0002 50% off | $0.0002 50% off | 4,096 This model allows prompts of "unlimited" length, using a middle-out transform. | Unfiltered |
Meta: Llama v2 70B Chat (beta)meta-llama/llama-2-70b-chat | $0.0015 50% off | $0.0015 50% off | 4,096 This model allows prompts of "unlimited" length, using a middle-out transform. | Unfiltered |
Meta: CodeLlama 34B Instruct (beta)meta-llama/codellama-34b-instruct | $0.0005 50% off | $0.0005 50% off | 8,096 This model allows prompts of "unlimited" length, using a middle-out transform. | Unfiltered |
Phind: Phind CodeLlama 34B v2 (beta)phind/phind-codellama-34b-v2 | $0.0005 50% off | $0.0005 50% off | 4,096 This model allows prompts of "unlimited" length, using a middle-out transform. | Unfiltered |
Nous: Hermes Llama2 13B (beta)nousresearch/nous-hermes-llama2-13b | $0.0002 50% off | $0.0002 50% off | 4,096 This model allows prompts of "unlimited" length, using a middle-out transform. | Unfiltered |
Mancer: Weaver 12k (alpha)mancer/weaver | $0.005625 | $0.005625 | 8,000 This model allows prompts of "unlimited" length, using a middle-out transform. | Unfiltered |
MythoMax L2 13B (beta)gryphe/mythomax-l2-13b | $0.001875 | $0.001875 | 8,192 This model allows prompts of "unlimited" length, using a middle-out transform. | Unfiltered |
Pygmalion: Mythalion 13B (beta)pygmalionai/mythalion-13b | $0.001875 | $0.001875 | 6,144 This model allows prompts of "unlimited" length, using a middle-out transform. | Unfiltered |
ReMM SLERP L2 13B (beta)undi95/remm-slerp-l2-13b | $0.001875 | $0.001875 | 6,144 This model allows prompts of "unlimited" length, using a middle-out transform. | Unfiltered |
Airoboros L2 70B (beta)jondurbin/airoboros-l2-70b | $0.013875 | $0.013875 | 4,096 This model allows prompts of "unlimited" length, using a middle-out transform. | Unfiltered |
Synthia 70B (beta)migtissera/synthia-70b | $0.013875 | $0.013875 | 6,144 This model allows prompts of "unlimited" length, using a middle-out transform. | Unfiltered |
Mistral 7B Instruct v0.1 (beta)mistralai/mistral-7b-instruct | $0 100% off | $0 100% off | 4,096 This model allows prompts of "unlimited" length, using a middle-out transform. | Unfiltered |
Media modelsMore coming soon. Learn about making 3D object requests in our Discord
Model Name & ID | Completion cost (per 32 steps) | Moderation Whether content filtering is applied by OpenRouter, per the model provider's Terms of Service. Developers should adhere to the terms of the model regardless. |
---|---|---|
OpenAI: Shap-e (beta)openai/shap-e | $0.01 | Unfiltered |
Note: Different models tokenize text in different ways. Some models break up text into chunks of multiple characters (GPT, Claude, Llama, etc) while others tokenize by character (PaLM). This means that the number of tokens may vary depending on the model.
Fallback Models
OpenRouter allows you to automatically try other models if the primary model is down or refuses to reply to the prompt due to content moderation required by the provider:
const requestBody = {
model,
// Use a recommended, open-source fallback model if
// the above 'model' refuses to answer the prompt:
route: "fallback",
... // Other params
}
If the model you selected returns a moderation error, OpenRouter will try to use the fallback model instead. If the fallback model is down or returns an error, OpenRouter will return that error.
Requests are priced using the model that was used, which will be returned in the model
attribute of the response body.
As a shorthand for customizing the fallback model, you can specify a list of models to try in the models
parameter:
{
models: ["anthropic/claude-2", "gryphe/mythomax-l2-13b"],
route: "fallback",
... // Other params
}
If no fallback model is specified, OpenRouter will try the most appropriate open-source model available, with pricing less than the primary model (or very close to it).
OAuth PKCE
Users can connect to OpenRouter in one click using Proof Key for Code Exchange (PKCE). Here's an example, and here's a step-by-step:
- Send your user to
https://openrouter.ai/auth?callback_url=YOUR_SITE_URL
.You can optionally include acode_challenge
(random password up to 256 digits) for extra security.
For maximum security, we recommend also settingcode_challenge_method
toS256
, and then settingcode_challenge
to the base64 encoding of the sha256 hash ofcode_verifier
, which you will submit in Step 2. More info in Auth0's docs. - Once logged in, they'll be redirected back to your site with a
code
in the URL. Make an API call (can be frontend or backend) to exchange the code for a user-controlled API key. And that's it for PKCE!Look for thecode
query parameter, e.g.?code=...
fetch("https://openrouter.ai/api/v1/auth/keys", { method: 'POST', body: JSON.stringify({ code: $CODE_FROM_QUERY_PARAM, code_verifier: $CODE_VERIFIER // Only needed if you sent a code_challenge in Step 1 }) });
- A fresh API key will be in the result under "key". Store it securely and make OpenAI-style requests (supports streaming as well):
fetch("https://openrouter.ai/api/v1/chat/completions", {
method: "POST",
headers: {
"Authorization": `Bearer ${OPENROUTER_API_KEY}`,
"HTTP-Referer": `${YOUR_SITE_URL}`, // To identify your app. Can be set to localhost for testing
"X-Title": `${YOUR_SITE_NAME}`, // Optional. Shows on openrouter.ai
"Content-Type": "application/json"
},
body: JSON.stringify({
"model": "anthropic/claude-2", // Optional (user controls the default),
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello!"}
]
})
});
You can use JavaScript or any server-side framework, like Streamlit . The linked example shows multiple models and file Q&A.
API Keys
Users or developers can cover model costs with normal API keys. This allows you to use curl
or the OpenAI SDK directly with OpenRouter. Just create an API key, set the api_base
, and set a referrer header to make your app discoverable to others on OpenRouter.
Note: API keys on OpenRouter are more powerful than keys used directly for model APIs. They allow users to set credit limits for apps, and they can be used in OAuth flows.
Example code:import openai
openai.api_base = "https://openrouter.ai/api/v1"
openai.api_key = $OPENROUTER_API_KEY
response = openai.ChatCompletion.create(
model="openai/gpt-3.5-turbo", # Optional (user controls the default)
messages=[...],
headers={
"HTTP-Referer": $YOUR_SITE_URL, # To identify your app. Can be set to localhost for testing
"X-Title": $YOUR_APP_NAME, # Optional. Shows on openrouter.ai
},
)
reply = response.choices[0].message
To extend the Python code for streaming, see this example from OpenAI.
Requests & Responses
More docs coming. In the meantime, see the OpenAI Chat API, which is compatible with OpenRouter, with one exception:
Request Headers
OpenRouter requires a HTTP-Referer
header to identify your app and make it discoverable to users on openrouter.ai. You can also include an optional X-Title
header to set or modify the title of your app. Example:
fetch("https://openrouter.ai/api/v1/chat/completions", {
method: "POST",
headers: {
"Authorization": `Bearer ${OPENROUTER_API_KEY}`,
"HTTP-Referer": `${YOUR_SITE_URL}`, // To identify your app. Can be set to localhost for testing
"X-Title": `${YOUR_SITE_NAME}`, // Optional. Shows on openrouter.ai
"Content-Type": "application/json"
},
body: JSON.stringify({
"messages": [
{"role": "user", "content": "Who are you?"}
]
})
});
Request Body
More docs coming. In the meantime, see the OpenAI Chat API, which OpenRouter extends.
Model routing: If the model
parameter is omitted, the user or payer's default is used. Otherwise, remember to select a value for model
from the supported models or API, and include the organization prefix. OpenRouter will select the least expensive and best GPUs available to serve the request, and fall back to other providers or GPUs if it receives a 5xx response code or if you are rate-limited.
Streaming: Server-Sent Events (SSE) are supported as well, to enable streaming for all models. Simply send stream: true
in your request body. The SSE stream will occasionally contain a "comment" payload, which you should ignore (noted below).
Non-standard parameters: If the chosen model doesn't support a request parameter (such as logit_bias
in non-OpenAI models, or top_k
for OpenAI), then the parameter is ignored. The rest are forwarded to the underlying model API.
Response Body
Responses are largely consistent with OpenAI. This means that choices
is always an array, even if the model only returns one completion. Each choice will contain a delta
property if a stream was requested and a message
property otherwise. This makes it easier to use the same code for all models. Note that finish_reason
will vary depending on the model provider.
The model
property tells you which model was used inside the underlying API. Example:
{
"id": "gen-xxxxxxxxxxxxxx",
"choices": [
{
"finish_reason": "stop", // Different models provide different reasons here
"message": { // will be "delta" if streaming
role: "assistant",
content: "Hello there!"
}
}
],
"model": "gpt-3.5-turbo-0613" // Could also be "claude-1.3-100k", "chat-bison@001", etc, depending on the "model" that ends up being used
}
You can use the returned id
to query for the generation status after the request is complete:
const generation = await fetch("https://openrouter.ai/api/v1/generation?id=$GENERATION_ID", { headers })
await generation.json()
// OUTPUT:
{
"id": "gen-nNPYi0ZB6GOK5TNCUMHJGgXo",
"model": "openai/gpt-4-32k",
"streamed": false,
"generation_time": 2,
"created_at": "2023-09-02T20:29:18.574972+00:00",
"tokens_prompt": 24,
"tokens_completion": 29,
"native_tokens_prompt": null,
"native_tokens_completion": null,
"num_media_generations": null,
"origin": "https://localhost:47323/",
"usage": 0.00492
}
For SSE stream, we occasionally need to send an SSE comment to indicate that OpenRouter is processing your request. This is to prevent the connection from timing out. The comment will look like this:
: OPENROUTER PROCESSING
Comment payload can be safely ignored per the SSE specs. However you can leverage it to improve UX as needed such as by showing a dynamic loading indicator.
Some SSE client implementations might not parse the payload according to spec, which leads to an uncaught error when you JSON.stringify
the non-JSON payloads. We recommend the following clients:
Prompt Transforms
OpenRouter has a simple rule for choosing between sending a prompt
and sending a list of ChatML messages
:
- Choose
messages
if you want to have OpenRouter apply a recommended instruct template to your prompt, depending on which model serves your request. Available instruct modes include: - Choose
prompt
if you want to send a custom prompt to the model. This is useful if you want to use a custom instruct template or maintain full control over the prompt submitted to the model.
To help with prompts that exceed the maximum context size of a model, OpenRouter supports a custom parameter called transforms
:
{
transforms: ["middle-out"], // Compress prompts > context size
messages: [...], // "prompt" works as well
model // Works with any model
}
The transforms
param is an array of strings that tell OpenRouter to apply a series of transformations to the prompt before sending it to the model. Transformations are applied in-order. Available transforms are:
middle-out
: compress prompts and message chains to the context size. This helps users extend conversations in part because LLMs pay significantly less attention to the middle of sequences anyway. Works by compressing or removing messages in the middle of the prompt.
Note: some open-source models default to usingmiddle-out
, unless you exclude this transform by e.g. settingtransforms: []
in the request body.
Error Handling
For errors, OpenRouter returns a JSON response with the following shape:
type ErrorResponse = {
error: {
code: number
message: string
}
}
The HTTP Response will have the same status code as error.code
, forming a request error if:
- Your original request is invalid
- Your API key/account is out of credits
- You did not set
stream: true
and the LLM returned an error within 15 seconds.
Otherwise, the returned HTTP response status will be 200
and any error occured while the LLM is producing the output will be emitted in the response body or as an SSE data event.
Example code for printing errors in JavaScript:
const request = await fetch("https://openrouter.ai/...")
console.log(request.status) // Will be an error code unless the model started processing your request
const response = await request.json()
console.error(response.error?.status) // Will be an error code
console.error(response.error?.message)
Error Codes
- 400: Bad Request (invalid or missing params, CORS)
- 401: Invalid credentials (OAuth session expired, disabled/invalid API key)
- 402: Out of credits
- 403: Your chosen model requires moderation and your input was flagged
- 408: Your request timed out
- 429: You are being rate limited
- 502: Your chosen model is down or we received an invalid response from it
User Limits
Rate Limits and Credits Remaining
To check the rate limit or credits left on an API key, make a GET request to https://openrouter.ai/api/v1/auth/key
.
fetch("https://openrouter.ai/api/v1/auth/key", {
method: 'GET',
headers: {
'Authorization': 'Bearer $OPENROUTER_API_KEY'
},
});
If you submit a valid API key, you should get a response of the form:
type Key = {
data: {
label: string,
usage: number, // Number of credits used
limit: number | null, // Credit limit for the key, or null if unlimited
rate_limit: {
requests: number, // Number of requests allowed...
interval: string // in this interval, e.g. "10s"
}
}
}
Rate limits are a function of the number of credits remaining on the key or account. Basically, your rate limit is the number of credits you have per second. To be exact:
requests_per_10_seconds = 10 * (1 + Math.floor(Math.max(credits, 0)))
Example 1: if you have 9.9 credits remaining, you can make 100 requests every 10 seconds.
Example 2: if you have -0.1 credits remaining, you can make 10 requests every 10 seconds (but you may see 402
errors).
Token Limits
Some users may have too few credits on their account to make expensive requests. OpenRouter provides a way to know that before making a request to any model.
To get the maximum tokens that a user can generate and the maximum tokens allowed in their prompt, add authentication headers in your request to https://openrouter.ai/api/v1/models
:
fetch("https://openrouter.ai/api/v1/models", {
method: 'GET',
headers: {
'Authorization': 'Bearer $OPENROUTER_API_KEY'
},
});
Each model will include an per_request_limits
property:
type Model = {
id: string,
pricing: {
prompt: number,
completion: number
},
context_length: number,
per_request_limits: {
prompt_tokens: number,
completion_tokens: number
}
}
Other Frameworks
You can find a few examples of using OpenRouter with other frameworks in this Github repository. Here are some examples:
- Using npm i openai: github
- Using Streamlit, a way to build and share Python apps: github
- Using LangChain for Python, a composable LLM framework: github
- Using LangChain.js: github
- Using the Vercel AI SDK:
const chat = new ChatOpenAI({
modelName: "anthropic/claude-instant-v1",
temperature: 0.8,
streaming: true,
openAIApiKey: $OPENROUTER_API_KEY,
}, {
basePath: $OPENROUTER_BASE_URL + "/api/v1",
baseOptions: {
headers: {
"HTTP-Referer": "https://localhost:3000/", // To identify your app. Can be set to localhost for testing
"X-Title": "Langchain.js Testing", // Optional. Shows on openrouter.ai
},
},
});
const config = new Configuration({
basePath: $OPENROUTER_BASE_URL + "/api/v1",
apiKey: $OPENROUTER_API_KEY,
baseOptions: {
headers: {
"HTTP-Referer": "https://localhost:3000/", // To identify your app. Can be set to localhost for testing
"X-Title": "Vercel Testing", // Optional. Shows on openrouter.ai
}
}
})
const openrouter = new OpenAIApi(config)
3D Objects (beta)
OpenRouter supports text-to-3D Object generation, currently in beta. See supported media models and try a demo. To generate 3D Objects, send a POST request to https://openrouter.ai/api/v1/objects/generations
curl https://openrouter.ai/api/v1/objects/generations \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENROUTER_API_KEY" \
-H "HTTP-Referer: $YOUR_SITE_URL" \
-H "X-Title: $YOUR_SITE_NAME" \
-d '{
"prompt": "a chair shaped like an avacado", # Required
"num_inference_steps": 32, # Optional
"num_outputs": 1, # Optional
"extension": "ply", # Optional
"model": "openai/shap-e", # Optional
}'
//Each generation will contain either a base64 string or a hosted url, or both.
interface MediaOutput {
uri?: string; //base64 string
url?: string; //hosted url
}
interface MediaResponse {
generations: MediaOutput[];
}