Docs

Create sustainable, model-agnostic AI apps.

Sustainable. Users can pay for what they use, instead of developers.

Model-agnostic. Let users choose their own model, or let OpenRouter route for them. Pick between OpenAI, Anthropic, and more. User preferences are portable between models and apps.

Flexible auth. Use OAuth PKCE, API keys, or the Window AI extension.


Supported Models

Trying to configure your default model? Go to Settings.

Model usage can be paid by users, developers, or both, and may shift in availability. You can also fetch models, prices, and limits via API.
Token counting is approximate. OpenRouter does not store prompts or completions.

If you have your own model that you'd like to add to this list and monetize, click here.

Text models

Model Name
& ID
Prompt cost
(per 1k tokens)
Moderation

Whether content filtering is applied by OpenRouter, per the model provider's Terms of Service.


Developers should adhere to the terms of the model regardless.

OpenAI: GPT-3.5 Turboopenai/gpt-3.5-turbo
$0.0015
Filtered
OpenAI: GPT-3.5 Turbo 16kopenai/gpt-3.5-turbo-16k
$0.003
Filtered
OpenAI: GPT-4openai/gpt-4
$0.03
Filtered
OpenAI: GPT-4 32kopenai/gpt-4-32k
$0.06
Filtered
OpenAI: GPT-3.5 Turbo Instructopenai/gpt-3.5-turbo-instruct
$0.0015
Filtered
Anthropic: Claude v2anthropic/claude-2
$0.01102
Filtered
Anthropic: Claude Instant v1anthropic/claude-instant-v1
$0.00163
Filtered
Google: PaLM 2 Bisongoogle/palm-2-chat-bison
$0.0005
Unfiltered
Google: PaLM 2 Bison (Code Chat)google/palm-2-codechat-bison
$0.0005
Unfiltered
Meta: Llama v2 13B Chat (beta)meta-llama/llama-2-13b-chat
$0.0002
50% off
Unfiltered
Meta: Llama v2 70B Chat (beta)meta-llama/llama-2-70b-chat
$0.0015
50% off
Unfiltered
Meta: CodeLlama 34B Instruct (beta)meta-llama/codellama-34b-instruct
$0.0005
50% off
Unfiltered
Phind: Phind CodeLlama 34B v2 (beta)phind/phind-codellama-34b-v2
$0.0005
50% off
Unfiltered
Nous: Hermes Llama2 13B (beta)nousresearch/nous-hermes-llama2-13b
$0.0002
50% off
Unfiltered
Mancer: Weaver 12k (alpha)mancer/weaver
$0.005625
Unfiltered
MythoMax L2 13B (beta)gryphe/mythomax-l2-13b
$0.001875
Unfiltered
Pygmalion: Mythalion 13B (beta)pygmalionai/mythalion-13b
$0.001875
Unfiltered
ReMM SLERP L2 13B (beta)undi95/remm-slerp-l2-13b
$0.001875
Unfiltered
Airoboros L2 70B (beta)jondurbin/airoboros-l2-70b
$0.013875
Unfiltered
Synthia 70B (beta)migtissera/synthia-70b
$0.013875
Unfiltered
Mistral 7B Instruct v0.1 (beta)mistralai/mistral-7b-instruct
$0
100% off
Unfiltered


Media models
More coming soon. Learn about making 3D object requests in our Discord

Model Name
& ID
Moderation

Whether content filtering is applied by OpenRouter, per the model provider's Terms of Service.


Developers should adhere to the terms of the model regardless.

OpenAI: Shap-e (beta)openai/shap-eUnfiltered

Note: Different models tokenize text in different ways. Some models break up text into chunks of multiple characters (GPT, Claude, Llama, etc) while others tokenize by character (PaLM). This means that the number of tokens may vary depending on the model.


Fallback Models

OpenRouter allows you to automatically try other models if the primary model is down or refuses to reply to the prompt due to content moderation required by the provider:

const requestBody = {
  model,
  // Use a recommended, open-source fallback model if
  // the above 'model' refuses to answer the prompt:
  route: "fallback",
  ... // Other params
}

If the model you selected returns a moderation error, OpenRouter will try to use the fallback model instead. If the fallback model is down or returns an error, OpenRouter will return that error.

Requests are priced using the model that was used, which will be returned in the model attribute of the response body.

As a shorthand for customizing the fallback model, you can specify a list of models to try in the models parameter:

{
  models: ["anthropic/claude-2", "gryphe/mythomax-l2-13b"],
  route: "fallback",
  ... // Other params
}

If no fallback model is specified, OpenRouter will try the most appropriate open-source model available, with pricing less than the primary model (or very close to it).


OAuth PKCE

Users can connect to OpenRouter in one click using Proof Key for Code Exchange (PKCE). Here's an example, and here's a step-by-step:

  1. Send your user to https://openrouter.ai/auth?callback_url=YOUR_SITE_URL.
    You can optionally include a code_challenge (random password up to 256 digits) for extra security.

    For maximum security, we recommend also setting code_challenge_method to S256, and then setting code_challenge to the base64 encoding of the sha256 hash of code_verifier, which you will submit in Step 2. More info in Auth0's docs.
  2. Once logged in, they'll be redirected back to your site with a code in the URL
    Look for the code query parameter, e.g. ?code=...
    . Make an API call (can be frontend or backend) to exchange the code for a user-controlled API key. And that's it for PKCE!
    fetch("https://openrouter.ai/api/v1/auth/keys", {
      method: 'POST',
      body: JSON.stringify({
        code: $CODE_FROM_QUERY_PARAM,
        code_verifier: $CODE_VERIFIER // Only needed if you sent a code_challenge in Step 1
      })
    });
  3. A fresh API key will be in the result under "key". Store it securely and make OpenAI-style requests (supports streaming as well):
fetch("https://openrouter.ai/api/v1/chat/completions", {
  method: "POST",
  headers: {
    "Authorization": `Bearer ${OPENROUTER_API_KEY}`,
    "HTTP-Referer": `${YOUR_SITE_URL}`, // To identify your app. Can be set to localhost for testing
    "X-Title": `${YOUR_SITE_NAME}`, // Optional. Shows on openrouter.ai
    "Content-Type": "application/json"
  },
  body: JSON.stringify({
    "model": "anthropic/claude-2", // Optional (user controls the default),
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Hello!"}
    ]
  })
});

You can use JavaScript or any server-side framework, like Streamlit . The linked example shows multiple models and file Q&A.


API Keys

Users or developers can cover model costs with normal API keys. This allows you to use curl or the OpenAI SDK directly with OpenRouter. Just create an API key, set the api_base, and set a referrer header to make your app discoverable to others on OpenRouter.

Note: API keys on OpenRouter are more powerful than keys used directly for model APIs. They allow users to set credit limits for apps, and they can be used in OAuth flows.

Example code:
import openai

openai.api_base = "https://openrouter.ai/api/v1"
openai.api_key = $OPENROUTER_API_KEY

response = openai.ChatCompletion.create(
  model="openai/gpt-3.5-turbo", # Optional (user controls the default)
  messages=[...],
  headers={
    "HTTP-Referer": $YOUR_SITE_URL, # To identify your app. Can be set to localhost for testing
    "X-Title": $YOUR_APP_NAME, # Optional. Shows on openrouter.ai
  },
)

reply = response.choices[0].message

To extend the Python code for streaming, see this example from OpenAI.


Requests & Responses

More docs coming. In the meantime, see the OpenAI Chat API, which is compatible with OpenRouter, with one exception:

Request Headers

OpenRouter requires a HTTP-Referer header to identify your app and make it discoverable to users on openrouter.ai. You can also include an optional X-Title header to set or modify the title of your app. Example:

fetch("https://openrouter.ai/api/v1/chat/completions", {
  method: "POST",
  headers: {
    "Authorization": `Bearer ${OPENROUTER_API_KEY}`,
    "HTTP-Referer": `${YOUR_SITE_URL}`, // To identify your app. Can be set to localhost for testing
    "X-Title": `${YOUR_SITE_NAME}`, // Optional. Shows on openrouter.ai
    "Content-Type": "application/json"
  },
  body: JSON.stringify({
    "messages": [
      {"role": "user", "content": "Who are you?"}
    ]
  })
});

Request Body

More docs coming. In the meantime, see the OpenAI Chat API, which OpenRouter extends.

Model routing: If the model parameter is omitted, the user or payer's default is used. Otherwise, remember to select a value for model from the supported models or API, and include the organization prefix. OpenRouter will select the least expensive and best GPUs available to serve the request, and fall back to other providers or GPUs if it receives a 5xx response code or if you are rate-limited.

Streaming: Server-Sent Events (SSE) are supported as well, to enable streaming for all models. Simply send stream: true in your request body. The SSE stream will occasionally contain a "comment" payload, which you should ignore (noted below).

Non-standard parameters: If the chosen model doesn't support a request parameter (such as logit_bias in non-OpenAI models, or top_k for OpenAI), then the parameter is ignored. The rest are forwarded to the underlying model API.

Response Body

Responses are largely consistent with OpenAI. This means that choices is always an array, even if the model only returns one completion. Each choice will contain a delta property if a stream was requested and a message property otherwise. This makes it easier to use the same code for all models. Note that finish_reason will vary depending on the model provider.

The model property tells you which model was used inside the underlying API. Example:

{
  "id": "gen-xxxxxxxxxxxxxx",
  "choices": [
    {
      "finish_reason": "stop", // Different models provide different reasons here
      "message": { // will be "delta" if streaming
        role: "assistant",
        content: "Hello there!"
      }
    }
  ],
  "model": "gpt-3.5-turbo-0613" // Could also be "claude-1.3-100k", "chat-bison@001", etc, depending on the "model" that ends up being used
}

You can use the returned id to query for the generation status after the request is complete:

const generation = await fetch("https://openrouter.ai/api/v1/generation?id=$GENERATION_ID", { headers })

await generation.json()
// OUTPUT:
{
  "id": "gen-nNPYi0ZB6GOK5TNCUMHJGgXo",
  "model": "openai/gpt-4-32k",
  "streamed": false,
  "generation_time": 2,
  "created_at": "2023-09-02T20:29:18.574972+00:00",
  "tokens_prompt": 24,
  "tokens_completion": 29,
  "native_tokens_prompt": null,
  "native_tokens_completion": null,
  "num_media_generations": null,
  "origin": "https://localhost:47323/",
  "usage": 0.00492
}

For SSE stream, we occasionally need to send an SSE comment to indicate that OpenRouter is processing your request. This is to prevent the connection from timing out. The comment will look like this:

: OPENROUTER PROCESSING

Comment payload can be safely ignored per the SSE specs. However you can leverage it to improve UX as needed such as by showing a dynamic loading indicator.

Some SSE client implementations might not parse the payload according to spec, which leads to an uncaught error when you JSON.stringify the non-JSON payloads. We recommend the following clients:


Prompt Transforms

OpenRouter has a simple rule for choosing between sending a prompt and sending a list of ChatML messages:

  • Choose messages if you want to have OpenRouter apply a recommended instruct template to your prompt, depending on which model serves your request. Available instruct modes include:
  • Choose prompt if you want to send a custom prompt to the model. This is useful if you want to use a custom instruct template or maintain full control over the prompt submitted to the model.

To help with prompts that exceed the maximum context size of a model, OpenRouter supports a custom parameter called transforms:

{
  transforms: ["middle-out"], // Compress prompts > context size
  messages: [...], // "prompt" works as well
  model // Works with any model
}

The transforms param is an array of strings that tell OpenRouter to apply a series of transformations to the prompt before sending it to the model. Transformations are applied in-order. Available transforms are:

  • middle-out: compress prompts and message chains to the context size. This helps users extend conversations in part because LLMs pay significantly less attention to the middle of sequences anyway. Works by compressing or removing messages in the middle of the prompt.

    Note: some open-source models default to using middle-out, unless you exclude this transform by e.g. setting transforms: [] in the request body.

Error Handling

For errors, OpenRouter returns a JSON response with the following shape:

type ErrorResponse = {
  error: {
    code: number
    message: string
  }
}

The HTTP Response will have the same status code as error.code, forming a request error if:

  • Your original request is invalid
  • Your API key/account is out of credits
  • You did not set stream: true and the LLM returned an error within 15 seconds.

Otherwise, the returned HTTP response status will be 200 and any error occured while the LLM is producing the output will be emitted in the response body or as an SSE data event.

Example code for printing errors in JavaScript:

const request = await fetch("https://openrouter.ai/...")
console.log(request.status) // Will be an error code unless the model started processing your request
const response = await request.json()
console.error(response.error?.status) // Will be an error code
console.error(response.error?.message)

Error Codes

  • 400: Bad Request (invalid or missing params, CORS)
  • 401: Invalid credentials (OAuth session expired, disabled/invalid API key)
  • 402: Out of credits
  • 403: Your chosen model requires moderation and your input was flagged
  • 408: Your request timed out
  • 429: You are being rate limited
  • 502: Your chosen model is down or we received an invalid response from it

User Limits

Rate Limits and Credits Remaining

To check the rate limit or credits left on an API key, make a GET request to https://openrouter.ai/api/v1/auth/key.

fetch("https://openrouter.ai/api/v1/auth/key", {
  method: 'GET',
  headers: {
    'Authorization': 'Bearer $OPENROUTER_API_KEY'
  },
});

If you submit a valid API key, you should get a response of the form:

type Key = {
  data: {
    label: string,
    usage: number, // Number of credits used
    limit: number | null, // Credit limit for the key, or null if unlimited
    rate_limit: {
      requests: number, // Number of requests allowed...
      interval: string // in this interval, e.g. "10s"
    }
  }
}

Rate limits are a function of the number of credits remaining on the key or account. Basically, your rate limit is the number of credits you have per second. To be exact:

requests_per_10_seconds = 10 * (1 + Math.floor(Math.max(credits, 0)))

Example 1: if you have 9.9 credits remaining, you can make 100 requests every 10 seconds.

Example 2: if you have -0.1 credits remaining, you can make 10 requests every 10 seconds (but you may see 402 errors).

Token Limits

Some users may have too few credits on their account to make expensive requests. OpenRouter provides a way to know that before making a request to any model.

To get the maximum tokens that a user can generate and the maximum tokens allowed in their prompt, add authentication headers in your request to https://openrouter.ai/api/v1/models:

fetch("https://openrouter.ai/api/v1/models", {
  method: 'GET',
  headers: {
    'Authorization': 'Bearer $OPENROUTER_API_KEY'
  },
});

Each model will include an per_request_limits property:

type Model = {
  id: string,
  pricing: {
    prompt: number,
    completion: number
  },
  context_length: number,
  per_request_limits: {
    prompt_tokens: number,
    completion_tokens: number
  }
}

Other Frameworks

You can find a few examples of using OpenRouter with other frameworks in this Github repository. Here are some examples:

  • Using npm i openai: github
  • Using Streamlit, a way to build and share Python apps: github
  • Using LangChain for Python, a composable LLM framework: github
  • Using LangChain.js: github
  • const chat = new ChatOpenAI({
      modelName: "anthropic/claude-instant-v1",
      temperature: 0.8,
      streaming: true,
      openAIApiKey: $OPENROUTER_API_KEY,
    }, {
      basePath: $OPENROUTER_BASE_URL + "/api/v1",
      baseOptions: {
        headers: {
          "HTTP-Referer": "https://localhost:3000/", // To identify your app. Can be set to localhost for testing
          "X-Title": "Langchain.js Testing", // Optional. Shows on openrouter.ai
        },
      },
    });
  • Using the Vercel AI SDK:
  • const config = new Configuration({
      basePath: $OPENROUTER_BASE_URL + "/api/v1",
      apiKey: $OPENROUTER_API_KEY,
      baseOptions: {
        headers: {
          "HTTP-Referer": "https://localhost:3000/", // To identify your app. Can be set to localhost for testing
          "X-Title": "Vercel Testing", // Optional. Shows on openrouter.ai
        }
      }
    })
    
    const openrouter = new OpenAIApi(config)

3D Objects (beta)

OpenRouter supports text-to-3D Object generation, currently in beta. See supported media models and try a demo. To generate 3D Objects, send a POST request to https://openrouter.ai/api/v1/objects/generations

curl https://openrouter.ai/api/v1/objects/generations \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENROUTER_API_KEY" \
  -H "HTTP-Referer: $YOUR_SITE_URL" \
  -H "X-Title: $YOUR_SITE_NAME" \
  -d '{
    "prompt": "a chair shaped like an avacado", # Required
    "num_inference_steps": 32, # Optional
    "num_outputs": 1, # Optional
    "extension": "ply", # Optional
    "model": "openai/shap-e", # Optional
  }'
You should recieve a response of type MediaResponse:
//Each generation will contain either a base64 string or a hosted url, or both.
interface MediaOutput {
  uri?: string; //base64 string
  url?: string; //hosted url
}

interface MediaResponse {
  generations: MediaOutput[];
}