Docs

The future will bring us hundreds of language models and dozens of providers for each. How will you choose the best?

Prioritize price or performance. OpenRouter scouts for the lowest prices and best latencies/throughputs across dozens of providers, and lets you choose how to prioritize them.

Standardized API. No need to change your code when switching between models or providers. You can even let users your choose and pay for their own.

The best models will be used the most. Evals are flawed. Instead, compare models by how often they're used, and soon, for which purposes. Chat with multiple at once in the Playground.


Quick Start

fetch("https://openrouter.ai/api/v1/chat/completions", {
  method: "POST",
  headers: {
    "Authorization": `Bearer ${OPENROUTER_API_KEY}`,
    "HTTP-Referer": `${YOUR_SITE_URL}`, // Optional, for including your app on openrouter.ai rankings.
    "X-Title": `${YOUR_SITE_NAME}`, // Optional. Shows in rankings on openrouter.ai.
    "Content-Type": "application/json"
  },
  body: JSON.stringify({
    "model": "openai/gpt-3.5-turbo",
    "messages": [
      {"role": "user", "content": "What is the meaning of life?"},
    ],
  })
});

You can also use OpenRouter with OpenAI's client API:

import OpenAI from "openai"

const openai = new OpenAI({
  baseURL: "https://openrouter.ai/api/v1",
  apiKey: $OPENROUTER_API_KEY,
  defaultHeaders: {
    "HTTP-Referer": $YOUR_SITE_URL, // Optional, for including your app on openrouter.ai rankings.
    "X-Title": $YOUR_SITE_NAME, // Optional. Shows in rankings on openrouter.ai.
  },
  // dangerouslyAllowBrowser: true,
})
async function main() {
  const completion = await openai.chat.completions.create({
    model: "openai/gpt-3.5-turbo",
    messages: [
      { role: "user", content: "Say this is a test" }
    ],
  })

  console.log(completion.choices[0].message)
}
main()

Supported Models

Model usage can be paid by users, developers, or both, and may shift in availability. You can also fetch models, prices, and limits via API.

If you'd like to add an open source model directly to OpenRouter, visit our Github here.

Text models

Model Name
& ID
Prompt cost
($ per 1k tokens)
Moderation
Auto (best for prompt)openrouter/auto--
Nous: Capybara 7B (free)nousresearch/nous-capybara-7b:free
$0
100% off
None
Mistral 7B Instruct (free)mistralai/mistral-7b-instruct:free
$0
100% off
None
OpenChat 3.5 (free)openchat/openchat-7b:free
$0
100% off
None
MythoMist 7B (free)gryphe/mythomist-7b:free
$0
100% off
None
Toppy M 7B (free)undi95/toppy-m-7b:free
$0
100% off
None
Cinematika 7B (alpha) (free)openrouter/cinematika-7b:free
$0
100% off
None
Google: Gemma 7B (free)google/gemma-7b-it:free
$0
100% off
None
Psyfighter v2 13Bkoboldai/psyfighter-13b-2
$0.001
90% off
None
Neural Chat 7B v3.1intel/neural-chat-7b
$0.005
50% off
None
MythoMax 13Bgryphe/mythomax-l2-13b
$0.00018
40% off
None
Pygmalion: Mythalion 13Bpygmalionai/mythalion-13b
$0.001125
25% off
None
Xwin 70Bxwin-lm/xwin-lm-70b
$0.00375
25% off
None
Goliath 120Balpindale/goliath-120b
$0.009375
25% off
None
Noromaid 20Bneversleep/noromaid-20b
$0.00225
25% off
None
MythoMist 7Bgryphe/mythomist-7b
$0.000375
25% off
None
Midnight Rose 70Bsophosympatheia/midnight-rose-70b
$0.009
25% off
None
Fimbulvetr 11B v2sao10k/fimbulvetr-11b-v2
$0.0005499
25% off
None
ReMM SLERP 13B (extended)undi95/remm-slerp-l2-13b:extended
$0.001125
25% off
None
MythoMax 13B (extended)gryphe/mythomax-l2-13b:extended
$0.001125
25% off
None
Meta: Llama 3 8B Instruct (extended)meta-llama/llama-3-8b-instruct:extended
$0.0002751
25% off
None
Mancer: Weaver (alpha)mancer/weaver
$0.003375
25% off
None
Nous: Capybara 7Bnousresearch/nous-capybara-7b
$0.00018
10% off
None
Meta: CodeLlama 34B Instructmeta-llama/codellama-34b-instruct
$0.00072
10% off
None
Meta: CodeLlama 70B Instructcodellama/codellama-70b-instruct
$0.00081
10% off
None
Phind: CodeLlama 34B v2phind/phind-codellama-34b
$0.00072
10% off
None
OpenHermes 2 Mistral 7Bteknium/openhermes-2-mistral-7b
$0.00018
10% off
None
ReMM SLERP 13Bundi95/remm-slerp-l2-13b
$0.00027
10% off
None
Cinematika 7B (alpha)openrouter/cinematika-7b
$0.00018
10% off
None
Yi 34B Chat01-ai/yi-34b-chat
$0.00072
10% off
None
Yi 34B (base)01-ai/yi-34b
$0.00072
10% off
None
Yi 6B (base)01-ai/yi-6b
$0.000126
10% off
None
StripedHyena Nous 7Btogethercomputer/stripedhyena-nous-7b
$0.00018
10% off
None
StripedHyena Hessian 7B (base)togethercomputer/stripedhyena-hessian-7b
$0.00018
10% off
None
Mixtral 8x7B (base)mistralai/mixtral-8x7b
$0.00054
10% off
None
Nous: Hermes 2 Yi 34Bnousresearch/nous-hermes-yi-34b
$0.00072
10% off
None
Nous: Hermes 2 Mixtral 8x7B SFTnousresearch/nous-hermes-2-mixtral-8x7b-sft
$0.00054
10% off
None
Nous: Hermes 2 Mistral 7B DPOnousresearch/nous-hermes-2-mistral-7b-dpo
$0.00018
10% off
None
Meta: Llama 3 70B Instructmeta-llama/llama-3-70b-instruct
$0.00081
10% off
None
Mixtral 8x7B Instruct (nitro)mistralai/mixtral-8x7b-instruct:nitro
$0.00054
10% off
None
Mistral OpenOrca 7Bopen-orca/mistral-7b-openorca
$0.0001425
5% off
None
Hugging Face: Zephyr 7Bhuggingfaceh4/zephyr-7b-beta
$0.0001425
5% off
None
OpenAI: GPT-3.5 Turboopenai/gpt-3.5-turbo
$0.0005
Moderated
OpenAI: GPT-3.5 Turbo 16kopenai/gpt-3.5-turbo-0125
$0.0005
Moderated
OpenAI: GPT-3.5 Turbo 16kopenai/gpt-3.5-turbo-16k
$0.003
Moderated
OpenAI: GPT-4 Turboopenai/gpt-4-turbo
$0.01
Moderated
OpenAI: GPT-4 Turbo Previewopenai/gpt-4-turbo-preview
$0.01
Moderated
OpenAI: GPT-4openai/gpt-4
$0.03
Moderated
OpenAI: GPT-4 32kopenai/gpt-4-32k
$0.06
Moderated
OpenAI: GPT-4 Visionopenai/gpt-4-vision-preview
$0.01
Moderated
OpenAI: GPT-3.5 Turbo Instructopenai/gpt-3.5-turbo-instruct
$0.0015
Moderated
Google: PaLM 2 Chatgoogle/palm-2-chat-bison
$0.00025
None
Google: PaLM 2 Code Chatgoogle/palm-2-codechat-bison
$0.00025
None
Google: PaLM 2 Chat 32kgoogle/palm-2-chat-bison-32k
$0.00025
None
Google: PaLM 2 Code Chat 32kgoogle/palm-2-codechat-bison-32k
$0.00025
None
Google: Gemini Pro 1.0google/gemini-pro
$0.000125
None
Google: Gemini Pro Vision 1.0google/gemini-pro-vision
$0.000125
None
Google: Gemini Pro 1.5 (preview)google/gemini-pro-1.5
$0.0025
None
Perplexity: PPLX 70B Onlineperplexity/pplx-70b-online
$0.001
None
Perplexity: PPLX 7B Onlineperplexity/pplx-7b-online
$0.0002
None
Perplexity: PPLX 7B Chatperplexity/pplx-7b-chat
$0.0002
None
Perplexity: PPLX 70B Chatperplexity/pplx-70b-chat
$0.001
None
Perplexity: Sonar 7Bperplexity/sonar-small-chat
$0.0002
None
Perplexity: Sonar 8x7Bperplexity/sonar-medium-chat
$0.0006
None
Perplexity: Sonar 7B Onlineperplexity/sonar-small-online
$0.0002
None
Perplexity: Sonar 8x7B Onlineperplexity/sonar-medium-online
$0.0006
None
FireLLaVA 13Bfireworks/firellava-13b
$0.0002
None
Anthropic: Claude 3 Opusanthropic/claude-3-opus
$0.015
Moderated
Anthropic: Claude 3 Sonnetanthropic/claude-3-sonnet
$0.003
Moderated
Anthropic: Claude 3 Haikuanthropic/claude-3-haiku
$0.00025
Moderated
Anthropic: Claude v2anthropic/claude-2
$0.008
Moderated
Anthropic: Claude v2.1anthropic/claude-2.1
$0.008
Moderated
Anthropic: Claude v2.0anthropic/claude-2.0
$0.008
Moderated
Anthropic: Claude Instant v1anthropic/claude-instant-1
$0.0008
Moderated
Anthropic: Claude Instant v1.2anthropic/claude-instant-1.2
$0.0008
Moderated
Anthropic: Claude 3 Opus (self-moderated)anthropic/claude-3-opus:beta
$0.015
None
Anthropic: Claude 3 Sonnet (self-moderated)anthropic/claude-3-sonnet:beta
$0.003
None
Anthropic: Claude 3 Haiku (self-moderated)anthropic/claude-3-haiku:beta
$0.00025
None
Anthropic: Claude v2 (self-moderated)anthropic/claude-2:beta
$0.008
None
Anthropic: Claude v2.1 (self-moderated)anthropic/claude-2.1:beta
$0.008
None
Anthropic: Claude v2.0 (self-moderated)anthropic/claude-2.0:beta
$0.008
None
Anthropic: Claude Instant v1 (self-moderated)anthropic/claude-instant-1:beta
$0.0008
None
Meta: Llama v2 13B Chatmeta-llama/llama-2-13b-chat
$0.00013
None
Meta: Llama v2 70B Chatmeta-llama/llama-2-70b-chat
$0.0006
None
Nous: Hermes 13Bnousresearch/nous-hermes-llama2-13b
$0.00026
None
Nous: Capybara 34Bnousresearch/nous-capybara-34b
$0.0009
None
Airoboros 70Bjondurbin/airoboros-l2-70b
$0.0007
None
Chronos Hermes 13B v2austism/chronos-hermes-13b
$0.00013
None
Mistral 7B Instructmistralai/mistral-7b-instruct
$0.0001
None
OpenHermes 2.5 Mistral 7Bteknium/openhermes-2.5-mistral-7b
$0.00017
None
OpenChat 3.5openchat/openchat-7b
$0.0001
None
Toppy M 7Bundi95/toppy-m-7b
$0.00015
None
lzlv 70Blizpreciatior/lzlv-70b-fp16-hf
$0.0007
None
Mixtral 8x7B Instructmistralai/mixtral-8x7b-instruct
$0.00024
None
Dolphin 2.6 Mixtral 8x7B 🐬cognitivecomputations/dolphin-mixtral-8x7b
$0.0005
None
Noromaid Mixtral 8x7B Instructneversleep/noromaid-mixtral-8x7b-instruct
$0.008
None
Nous: Hermes 2 Mixtral 8x7B DPOnousresearch/nous-hermes-2-mixtral-8x7b-dpo
$0.00027
None
RWKV v5 World 3Brwkv/rwkv-5-world-3b
$0
None
RWKV v5 3B AI Townrecursal/rwkv-5-3b-ai-town
$0
None
RWKV v5: Eagle 7Brecursal/eagle-7b
$0
None
Google: Gemma 7Bgoogle/gemma-7b-it
$0.0001
None
Databricks: DBRX 132B Instructdatabricks/dbrx-instruct
$0.0006
None
Zephyr 141B-A35Bhuggingfaceh4/zephyr-orpo-141b-a35b
$0.00065
None
Meta: Llama 3 8B Instructmeta-llama/llama-3-8b-instruct
$0.0001
None
WizardLM-2 8x22Bmicrosoft/wizardlm-2-8x22b
$0.00065
None
WizardLM-2 7Bmicrosoft/wizardlm-2-7b
$0.00007
None
Mistral: Mixtral 8x22B (base)mistralai/mixtral-8x22b
$0.0009
None
Mistral: Mixtral 8x22B Instructmistralai/mixtral-8x22b-instruct
$0.00065
None
Lynn: Llama 3 Soliloquy 8Blynn/soliloquy-l3
$0.0001
None
Hugging Face: Zephyr 7B (free)huggingfaceh4/zephyr-7b-beta:free
$0
None
Meta: Llama v2 70B Chat (nitro)meta-llama/llama-2-70b-chat:nitro
$0.0009
None
MythoMax 13B (nitro)gryphe/mythomax-l2-13b:nitro
$0.0002
None
Mistral 7B Instruct (nitro)mistralai/mistral-7b-instruct:nitro
$0.0002
None
Google: Gemma 7B (nitro)google/gemma-7b-it:nitro
$0.0002
None
Toppy M 7B (nitro)undi95/toppy-m-7b:nitro
$0.00015
None
WizardLM-2 8x22B (nitro)microsoft/wizardlm-2-8x22b:nitro
$0.001
None
Meta: Llama 3 8B Instruct (nitro)meta-llama/llama-3-8b-instruct:nitro
$0.0002
None
Meta: Llama 3 70B Instruct (nitro)meta-llama/llama-3-70b-instruct:nitro
$0.0009
None
Llava 13Bhaotian-liu/llava-13b
$0.01
None
Nous: Hermes 2 Vision 7B (alpha)nousresearch/nous-hermes-2-vision-7b
$0.01
None
Mistral Tinymistralai/mistral-tiny
$0.00025
None
Mistral Smallmistralai/mistral-small
$0.002
None
Mistral Mediummistralai/mistral-medium
$0.0027
None
Mistral Largemistralai/mistral-large
$0.008
None
Cohere: Commandcohere/command
$0.001
None
Cohere: Command Rcohere/command-r
$0.0005
None
Cohere: Command R+cohere/command-r-plus
$0.003
None


Media models

OpenAI: Shap-eopenai/shap-e$0.01 / 32 steps

Note: Different models tokenize text in different ways. Some models break up text into chunks of multiple characters (GPT, Claude, Llama, etc) while others tokenize by character (PaLM). This means that the number of tokens may vary depending on the model.


Provider Routing

OpenRouter routes each request to the best available model provider for your preferences. You can customize them using the provider object in the request body. Here are the available options:

Custom Provider Selection

You can set the providers that OpenRouter will use for your request using the order field. The router will filter this list to only include providers that are available for the model you're using, and then try one at a time, failing if none are available. If you don't set this field, the router will use the default ordering shown on the model page.

fetch("https://openrouter.ai/api/v1/chat/completions", {
  method: "POST",
  headers: {
    "Authorization": `Bearer ${OPENROUTER_API_KEY}`,
    "HTTP-Referer": `${YOUR_SITE_URL}`, // Optional, for including your app on openrouter.ai rankings.
    "X-Title": `${YOUR_SITE_NAME}`, // Optional. Shows in rankings on openrouter.ai.
    "Content-Type": "application/json"
  },
  body: JSON.stringify({
    "model": "mistralai/mixtral-8x7b-instruct",
    "messages": [
      {"role": "user", "content": "Hello"},
    ],
    "provider": {
      "order": [
        "Azure",
        "Together"
      ]
    },
  })
});

Required Parameters (beta)

By default, providers that don't support a given LLM parameter will ignore them. But you can change this and only filter for providers that support the parameters in your request.

For example, to only use providers that support JSON formatting:

fetch("https://openrouter.ai/api/v1/chat/completions", {
  method: "POST",
  headers: {
    "Authorization": `Bearer ${OPENROUTER_API_KEY}`,
    "HTTP-Referer": `${YOUR_SITE_URL}`, // Optional, for including your app on openrouter.ai rankings.
    "X-Title": `${YOUR_SITE_NAME}`, // Optional. Shows in rankings on openrouter.ai.
    "Content-Type": "application/json"
  },
  body: JSON.stringify({
    "model": "mistralai/mixtral-8x7b-instruct",
    "messages": [
      {"role": "user", "content": "Hello"},
    ],
    "provider": {
      "require_parameters": true
    },
    "response_format": {
      "type": "json_object"
    },
  })
});

Data Privacy

Some model providers may log prompts, so we display them with a Data Policy tag on model pages. This is not a definitive source of third party data policies, but represents our best knowledge.

OpenRouter's data policy is managed on your Account page. You can disable logging there to also disable third party model providers that store logs for purposes like training. Alternatively, you can skip them on a per-request basis:

fetch("https://openrouter.ai/api/v1/chat/completions", {
  method: "POST",
  headers: {
    "Authorization": `Bearer ${OPENROUTER_API_KEY}`,
    "HTTP-Referer": `${YOUR_SITE_URL}`, // Optional, for including your app on openrouter.ai rankings.
    "X-Title": `${YOUR_SITE_NAME}`, // Optional. Shows in rankings on openrouter.ai.
    "Content-Type": "application/json"
  },
  body: JSON.stringify({
    "model": "mistralai/mixtral-8x7b-instruct",
    "messages": [
      {"role": "user", "content": "Hello"},
    ],
    "provider": {
      "data_collection": "deny"
    },
  })
});

Disabling a provider causes the router to skip over it and proceed to the next best one.

Disabling Fallbacks

To guarantee that your request is only served by the top (lowest-cost) provider, you can disable fallbacks:

fetch("https://openrouter.ai/api/v1/chat/completions", {
  method: "POST",
  headers: {
    "Authorization": `Bearer ${OPENROUTER_API_KEY}`,
    "HTTP-Referer": `${YOUR_SITE_URL}`, // Optional, for including your app on openrouter.ai rankings.
    "X-Title": `${YOUR_SITE_NAME}`, // Optional. Shows in rankings on openrouter.ai.
    "Content-Type": "application/json"
  },
  body: JSON.stringify({
    "model": "mistralai/mixtral-8x7b-instruct",
    "messages": [
      {"role": "user", "content": "Hello"},
    ],
    "provider": {
      "allow_fallbacks": false
    },
  })
});

For a complete list of options, see this JSON schema:

{
  "$ref": "#/definitions/ProviderPreferences",
  "definitions": {
    "ProviderPreferences": {
      "type": "object",
      "properties": {
        "allow_fallbacks": {
          "type": "boolean",
          "description": "Whether to allow backup providers to serve requests
- true: (default) when the primary provider is unavailable, use the next best provider.
- false: use only the primary provider, and return the upstream error if it's unavailable.
",
          "default": true
        },
        "require_parameters": {
          "type": "boolean",
          "description": "Whether to filter providers to only those that support the parameters you've provided. If this setting is omitted or set to false, then providers will receive only the parameters they support, and ignore the rest.",
          "default": false
        },
        "data_collection": {
          "type": "string",
          "enum": [
            "deny",
            "allow"
          ],
          "description": "Data collection setting. If no available model provider meets the requirement, your request will return an error.
- allow: (default) allow providers which store user data non-transiently and may train on it
- deny: use only providers which do not collect user data.
",
          "default": "allow"
        },
        "order": {
          "type": "array",
          "items": {
            "type": "string",
            "enum": [
              "OpenAI",
              "Anthropic",
              "HuggingFace",
              "Google",
              "Mancer",
              "Mancer 2",
              "Together",
              "DeepInfra",
              "Azure",
              "Modal",
              "AnyScale",
              "Replicate",
              "Perplexity",
              "Recursal",
              "Fireworks",
              "Mistral",
              "Groq",
              "Cohere",
              "Lepton",
              "OctoAI",
              "Novita",
              "Lynn",
              "Lynn 2"
            ]
          },
          "description": "An ordered list of provider names. The router will attempt to use the first provider in the subset of this list that supports your requested model, and fall back to the next if it is unavailable. If no providers are available, the request will fail with an error message."
        }
      },
      "additionalProperties": false
    }
  },
  "$schema": "http://json-schema.org/draft-07/schema#"
}

Model Routing

Multi-model routing is under development 👀

In the meantime, OpenRouter provides two options:

  1. The Auto router, a special model ID that you can use to choose between selected high-quality models based on heuristics applied to your prompt.

  2. The models array, which lets you automatically try other models if the primary model's providers are down, rate-limited, or refuse to reply due to content moderation required by all providers:

{
  "models": ["anthropic/claude-2.1", "gryphe/mythomax-l2-13b"],
  "route": "fallback",
  ... // Other params
}

If the model you selected returns an error, OpenRouter will try to use the fallback model instead. If the fallback model is down or returns an error, OpenRouter will return that error.

By default, any error can trigger the use of a fallback model, including context length validation errors, moderation flags for filtered models, rate-limiting, and downtime.

Requests are priced using the model that was used, which will be returned in the model attribute of the response body.

If no fallback model is specified but route: "fallback" is still included, OpenRouter will try the most appropriate open-source model available, with pricing less than the primary model (or very close to it).


OAuth PKCE

Users can connect to OpenRouter in one click using Proof Key for Code Exchange (PKCE). Here's an example, and here's a step-by-step:

  1. Send your user to https://openrouter.ai/auth?callback_url=YOUR_SITE_URL

    • You can optionally include a code_challenge (random password up to 256 digits) for extra security.
    • For maximum security, we recommend also setting code_challenge_method to S256, and then setting code_challenge to the base64 encoding of the sha256 hash of code_verifier, which you will submit in Step 2. More info in Auth0's docs.
  2. Once logged in, they'll be redirected back to your site with a code in the URL. Make an API call (can be frontend or backend) to exchange the code for a user-controlled API key. And that's it for PKCE!

    • Look for the code query parameter, e.g. ?code=....
fetch("https://openrouter.ai/api/v1/auth/keys", {
  method: 'POST',
  body: JSON.stringify({
    code: $CODE_FROM_QUERY_PARAM,
    code_verifier: $CODE_VERIFIER // Only needed if you sent a code_challenge in Step 1
  })
});
  1. A fresh API key will be in the result under "key". Store it securely and make OpenAI-style requests (supports streaming as well):
fetch("https://openrouter.ai/api/v1/chat/completions", {
  method: "POST",
  headers: {
    "Authorization": `Bearer ${OPENROUTER_API_KEY}`,
    "HTTP-Referer": `${YOUR_SITE_URL}`, // Optional, for including your app on openrouter.ai rankings.
    "X-Title": `${YOUR_SITE_NAME}`, // Optional. Shows in rankings on openrouter.ai.
    "Content-Type": "application/json"
  },
  body: JSON.stringify({
    "model": "anthropic/claude-2",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Hello!"},
    ],
  })
});

You can use JavaScript or any server-side framework, like Streamlit. The linked example shows multiple models and file Q&A.


API Keys

Users or developers can cover model costs with normal API keys. This allows you to use curl or the OpenAI SDK directly with OpenRouter. Just create an API key, set the api_base, and optionally set a referrer header to make your app discoverable to others on OpenRouter.

Note: API keys on OpenRouter are more powerful than keys used directly for model APIs. They allow users to set credit limits for apps, and they can be used in OAuth flows.

Example code:

import openai

openai.api_base = "https://openrouter.ai/api/v1"
openai.api_key = $OPENROUTER_API_KEY

response = openai.ChatCompletion.create(
  model="openai/gpt-3.5-turbo",
  messages=[...],
  headers={
    "HTTP-Referer": $YOUR_SITE_URL, # Optional, for including your app on openrouter.ai rankings.
    "X-Title": $YOUR_APP_NAME, # Optional. Shows in rankings on openrouter.ai.
  },
)

reply = response.choices[0].message

To stream with Python, see this example from OpenAI.


Requests

OpenRouter's request and response schemas are very similar to the OpenAI Chat API, with a few small differences. At a high level, OpenRouter normalizes the schema across models and providers so you only need to learn one.

Request Body

Here's the request schema as a TypeScript type. This will be the body of your POST request to the /api/v1/chat/completions endpoint (see the quick start above for an example).

// Definitions of subtypes are below
type Request = {
  // Either "messages" or "prompt" is required
  messages?: Message[];
  prompt?: string;

  // If "model" is unspecified, uses the user's default
  model?: string; // See "Supported Models" section

  // Allows to force the model to produce specific output format.
  // Only supported by OpenAI models, Nitro models, and some others - check the
  // providers on the model page on openrouter.ai/models to see if it's supported,
  // and set `require_parameters` to true in your Provider Preferences. See
  // openrouter.ai/docs#provider-routing
  response_format?: { type: 'json_object' };

  stop?: string | string[];
  stream?: boolean; // Enable streaming

  // See LLM Parameters (openrouter.ai/docs#parameters)
  max_tokens?: number; // Range: [1, context_length)
  temperature?: number; // Range: [0, 2]
  top_p?: number; // Range: (0, 1]
  top_k?: number; // Range: [1, Infinity) Not available for OpenAI models
  frequency_penalty?: number; // Range: [-2, 2]
  presence_penalty?: number; // Range: [-2, 2]
  repetition_penalty?: number; // Range: (0, 2]
  seed?: number; // OpenAI only

  // Function-calling
  // Only natively suported by OpenAI models. For others, we submit
  // a YAML-formatted string with these tools at the end of the prompt.
  tools?: Tool[];
  tool_choice?: ToolChoice;

  // Additional optional parameters
  logit_bias?: { [key: number]: number };

  // OpenRouter-only parameters
  // See "Prompt Transforms" section: openrouter.ai/docs#transforms
  transforms?: string[];
  // See "Model Routing" section: openrouter.ai/docs#model-routing
  models?: string[];
  route?: 'fallback';
  // See "Provider Routing" section: openrouter.ai/docs#provider-routing
  provider?: ProviderPreferences;
};

// Subtypes:

type TextContent = {
  type: 'text';
  text: string;
};

type ImageContentPart = {
  type: 'image_url';
  image_url: {
    url: string; // URL or base64 encoded image data
    detail?: string; // Optional, defaults to 'auto'
  };
};

type ContentPart = TextContent | ImageContentPart;

type Message = {
  role: 'user' | 'assistant' | 'system' | 'tool';
  // ContentParts are only for the 'user' role:
  content: string | ContentPart[];
  // If "name" is included, it will be prepended like this
  // for non-OpenAI models: `{name}: {content}`
  name?: string;
};

type FunctionDescription = {
  description?: string;
  name: string;
  parameters: object; // JSON Schema object
};

type Tool = {
  type: 'function';
  function: FunctionDescription;
};

type ToolChoice =
  | 'none'
  | 'auto'
  | {
      type: 'function';
      function: {
        name: string;
      };
    };

Request Headers

OpenRouter allows you to specify an optional HTTP-Referer header to identify your app and make it discoverable to users on openrouter.ai. You can also include an optional X-Title header to set or modify the title of your app. Example:

fetch("https://openrouter.ai/api/v1/chat/completions", {
  method: "POST",
  headers: {
    "Authorization": `Bearer ${OPENROUTER_API_KEY}`,
    "HTTP-Referer": `${YOUR_SITE_URL}`, // Optional, for including your app on openrouter.ai rankings.
    "X-Title": `${YOUR_SITE_NAME}`, // Optional. Shows in rankings on openrouter.ai.
    "Content-Type": "application/json"
  },
  body: JSON.stringify({
    "model": "mistralai/mixtral-8x7b-instruct",
    "messages": [
      {"role": "user", "content": "Who are you?"},
    ],
  })
});

Model routing: If the model parameter is omitted, the user or payer's default is used. Otherwise, remember to select a value for model from the supported models or API, and include the organization prefix. OpenRouter will select the least expensive and best GPUs available to serve the request, and fall back to other providers or GPUs if it receives a 5xx response code or if you are rate-limited.

Streaming: Server-Sent Events (SSE) are supported as well, to enable streaming for all models. Simply send stream: true in your request body. The SSE stream will occasionally contain a "comment" payload, which you should ignore (noted below).

Non-standard parameters: If the chosen model doesn't support a request parameter (such as logit_bias in non-OpenAI models, or top_k for OpenAI), then the parameter is ignored. The rest are forwarded to the underlying model API.

Assistant Prefill: OpenRouter supports asking models to complete a partial response. This can be useful for guiding models to respond in a certain way.

To use this features, simply include a message with role: "assistant" at the end of your messages array. Example:

fetch("https://openrouter.ai/api/v1/chat/completions", {
  method: "POST",
  headers: {
    "Authorization": `Bearer ${OPENROUTER_API_KEY}`,
    "HTTP-Referer": `${YOUR_SITE_URL}`, // Optional, for including your app on openrouter.ai rankings.
    "X-Title": `${YOUR_SITE_NAME}`, // Optional. Shows in rankings on openrouter.ai.
    "Content-Type": "application/json"
  },
  body: JSON.stringify({
    "model": "mistralai/mixtral-8x7b-instruct",
    "messages": [
      {"role": "user", "content": "Who are you?"},
      {"role": "assistant", "content": "I'm not sure, but my best guess is"},
    ],
  })
});

Stream Cancellation

For some providers, streaming requests can be canceled by aborting the connection or simply disconnecting.

When aborting the connection to a provider that supports stream cancellation, the model will stop processing the request, and billing will stop as soon as the upstream provider detects the disconnection.

If you're using the Fetch API, you can use the AbortController to cancel the stream. Here's an example:

const controller = new AbortController();

fetch('https://openrouter.ai/api/v1/chat/completions', {
  method: 'POST',
  headers: ...,
  body: ...,
  signal: controller.signal
})

...

// Later, to cancel the stream:
controller.abort();

NOTE: Aborting/disconnecting from a non-stream request or a stream request to a provider that does not support stream cancellation will not halt the model's processing in the background. You will still be billed for the rest of the completion.


Responses

Responses are largely consistent with the OpenAI Chat API. This means that choices is always an array, even if the model only returns one completion. Each choice will contain a delta property if a stream was requested and a message property otherwise. This makes it easier to use the same code for all models.

At a high level, OpenRouter normalizes the schema across models and providers so you only need to learn one.

Response Body

Note that finish_reason will vary depending on the model provider. The model property tells you which model was used inside the underlying API.

Here's the response schema as a TypeScript type:

// Definitions of subtypes are below

type Response = {
  id: string;
  // Depending on whether you set "stream" to "true" and
  // whether you passed in "messages" or a "prompt", you
  // will get a different output shape
  choices: (NonStreamingChoice | StreamingChoice | NonChatChoice | Error)[];
  created: number; // Unix timestamp
  model: string;
  object: 'chat.completion' | 'chat.completion.chunk';
  // For non-streaming responses only. For streaming responses,
  // see "Querying Cost and Stats" below.
  usage?: {
    completion_tokens: number; // Equivalent to "native_tokens_completion" in the /generation API
    prompt_tokens: number; // Equivalent to "native_tokens_prompt"
    total_tokens: number; // Sum of the above two fields
    total_cost: number; // Number of credits used by this generation
  };
};

// Subtypes:

type NonChatChoice = {
  finish_reason: string | null;
  text: string;
};

type NonStreamingChoice = {
  finish_reason: string | null; // Depends on the model. Ex: 'stop' | 'length' | 'content_filter' | 'tool_calls' | 'function_call'
  message: {
    content: string | null;
    role: string;
    tool_calls?: ToolCall[];
    // Deprecated, replaced by tool_calls
    function_call?: FunctionCall;
  };
};

type StreamingChoice = {
  finish_reason: string | null;
  delta: {
    content: string | null;
    role?: string;
    tool_calls?: ToolCall[];
    // Deprecated, replaced by tool_calls
    function_call?: FunctionCall;
  };
};

type Error = {
  code: number; // See "Error Handling" section
  message: string;
};

type FunctionCall = {
  name: string;
  arguments: string; // JSON format arguments
};

type ToolCall = {
  id: string;
  type: 'function';
  function: FunctionCall;
};

Here's an example:

{
  "id": "gen-xxxxxxxxxxxxxx",
  "choices": [
    {
      "finish_reason": "stop", // Different models provide different reasons here
      "message": {
        // will be "delta" if streaming
        "role": "assistant",
        "content": "Hello there!"
      }
    }
  ],
  "model": "openai/gpt-3.5-turbo" // Could also be "anthropic/claude-2.1", etc, depending on the "model" that ends up being used
}

Querying Cost and Stats

You can use the returned id to query for the generation stats (including token counts and cost) after the request is complete. This is how you can get the cost and tokens for all models and requests, streaming and non-streaming.

const generation = await fetch(
  "https://openrouter.ai/api/v1/generation?id=$GENERATION_ID",
  { headers }
)

await generation.json()
// OUTPUT:
{
  data: {
    "id": "gen-nNPYi0ZB6GOK5TNCUMHJGgXo",
    "model": "openai/gpt-4-32k",
    "streamed": false,
    "generation_time": 2,
    "created_at": "2023-09-02T20:29:18.574972+00:00",
    "tokens_prompt": 24,
    "tokens_completion": 29,
    "native_tokens_prompt": 24,
    "native_tokens_completion": 29,
    "num_media_prompt": null,
    "num_media_completion": null,
    "origin": "https://localhost:47323/",
    "total_cost": 0.00492
  }
};

Note that token counts and total_cost are also available in the usage field of the response body for non-streaming completions.

SSE Streaming Comments

For SSE streams, we occasionally need to send an SSE comment to indicate that OpenRouter is processing your request. This helps prevent connections from timing out. The comment will look like this:

: OPENROUTER PROCESSING

Comment payload can be safely ignored per the SSE specs. However, you can leverage it to improve UX as needed, e.g. by showing a dynamic loading indicator.

Some SSE client implementations might not parse the payload according to spec, which leads to an uncaught error when you JSON.stringify the non-JSON payloads. We recommend the following clients:


LLM Parameters

temperature
  • Optional, float, 0.0 to 2.0

  • Default: 1.0

  • Explainer Video: Watch

This setting influences the variety in the model's responses. Lower values lead to more predictable and typical responses, while higher values encourage more diverse and less common responses. At 0, the model always gives the same response for a given input.

top_p
  • Optional, float, 0.0 to 1.0

  • Default: 1.0

  • Explainer Video: Watch

This setting limits the model's choices to a percentage of likely tokens: only the top tokens whose probabilities add up to P. A lower value makes the model's responses more predictable, while the default setting allows for a full range of token choices. Think of it like a dynamic Top-K.

top_k
  • Optional, integer, 0 or above

  • Default: 0

  • Explainer Video: Watch

This limits the model's choice of tokens at each step, making it choose from a smaller set. A value of 1 means the model will always pick the most likely next token, leading to predictable results. By default this setting is disabled, making the model to consider all choices.

frequency_penalty
  • Optional, float, -2.0 to 2.0

  • Default: 0.0

  • Explainer Video: Watch

This setting aims to control the repetition of tokens based on how often they appear in the input. It tries to use less frequently those tokens that appear more in the input, proportional to how frequently they occur. Token penalty scales with the number of occurrences. Negative values will encourage token reuse.

presence_penalty
  • Optional, float, -2.0 to 2.0

  • Default: 0.0

  • Explainer Video: Watch

Adjusts how often the model repeats specific tokens already used in the input. Higher values make such repetition less likely, while negative values do the opposite. Token penalty does not scale with the number of occurrences. Negative values will encourage token reuse.

repetition_penalty
  • Optional, float, 0.0 to 2.0

  • Default: 1.0

  • Explainer Video: Watch

Helps to reduce the repetition of tokens from the input. A higher value makes the model less likely to repeat tokens, but too high a value can make the output less coherent (often with run-on sentences that lack small words). Token penalty scales based on original token's probability.

min_p
  • Optional, float, 0.0 to 1.0

  • Default: 0.0

Represents the minimum probability for a token to be considered, relative to the probability of the most likely token. (The value changes depending on the confidence level of the most probable token.) If your Min-P is set to 0.1, that means it will only allow for tokens that are at least 1/10th as probable as the best possible option.

top_a
  • Optional, float, 0.0 to 1.0

  • Default: 0.0

Consider only the top tokens with "sufficiently high" probabilities based on the probability of the most likely token. Think of it like a dynamic Top-P. A lower Top-A value focuses the choices based on the highest probability token but with a narrower scope. A higher Top-A value does not necessarily affect the creativity of the output, but rather refines the filtering process based on the maximum probability.

seed
  • Optional, integer

If specified, the inferencing will sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed for some models.

max_tokens
  • Optional, integer, 1 or above

This sets the upper limit for the number of tokens the model can generate in response. It won't produce more than this limit. The maximum value is the context length minus the prompt length.

logit_bias
  • Optional, map

Accepts a JSON object that maps tokens (specified by their token ID in the tokenizer) to an associated bias value from -100 to 100. Mathematically, the bias is added to the logits generated by the model prior to sampling. The exact effect will vary per model, but values between -1 and 1 should decrease or increase likelihood of selection; values like -100 or 100 should result in a ban or exclusive selection of the relevant token.

logprobs
  • Optional, boolean

Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned.

top_logprobs
  • Optional, integer

An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to true if this parameter is used.

response_format
  • Optional, map

Forces the model to produce specific output format. Setting to { "type": "json_object" } enables JSON mode, which guarantees the message the model generates is valid JSON. Note: when using JSON mode, you should also instruct the model to produce JSON yourself via a system or user message.

stop
  • Optional, array

Stop generation immediately if the model encounter any token specified in the stop array.


Parameters API


Prompt Transforms

OpenRouter has a simple rule for choosing between sending a prompt and sending a list of ChatML messages:

  • Choose messages if you want to have OpenRouter apply a recommended instruct template to your prompt, depending on which model serves your request. Available instruct modes include:
  • Choose prompt if you want to send a custom prompt to the model. This is useful if you want to use a custom instruct template or maintain full control over the prompt submitted to the model.

To help with prompts that exceed the maximum context size of a model, OpenRouter supports a custom parameter called transforms:

{
  transforms: ["middle-out"], // Compress prompts > context size. This is the default for all models.
  messages: [...], // "prompt" works as well
  model // Works with any model
}

The transforms param is an array of strings that tell OpenRouter to apply a series of transformations to the prompt before sending it to the model. Transformations are applied in-order. Available transforms are:

  • middle-out: compress prompts and message chains to the context size. This helps users extend conversations in part because LLMs pay significantly less attention to the middle of sequences anyway. Works by compressing or removing messages in the middle of the prompt. Additionally, it reduce the number of messages to adhere to the model's limit. For instance, Anthropic's Claude models enforce a maximum of 1000 messages.

Note: All OpenRouter models default to using middle-out, unless you exclude this transform by e.g. setting transforms: [] in the request body.


Error Handling

For errors, OpenRouter returns a JSON response with the following shape:

type ErrorResponse = {
  error: {
    code: number;
    message: string;
    metadata?: Record<string, unknown>;
  };
};

The HTTP Response will have the same status code as error.code, forming a request error if:

  • Your original request is invalid
  • Your API key/account is out of credits

Otherwise, the returned HTTP response status will be 200 and any error occured while the LLM is producing the output will be emitted in the response body or as an SSE data event.

Example code for printing errors in JavaScript:

const request = await fetch('https://openrouter.ai/...');
console.log(request.status); // Will be an error code unless the model started processing your request
const response = await request.json();
console.error(response.error?.status); // Will be an error code
console.error(response.error?.message);

Error Codes

  • 400: Bad Request (invalid or missing params, CORS)
  • 401: Invalid credentials (OAuth session expired, disabled/invalid API key)
  • 402: Your account or API key has insufficient credits. Add more credits and retry the request.
  • 403: Your chosen model requires moderation and your input was flagged
  • 408: Your request timed out
  • 429: You are being rate limited
  • 502: Your chosen model is down or we received an invalid response from it
  • 503: There is no available model provider that meets your routing requirements

Moderation Errors

If your input was flagged, the error metadata will contain information about the issue. The shape of the metadata is as follows:

type ModerationErrorMetadata = {
  reasons: string[]; // Why your input was flagged
  flagged_input: string; // The text segment that was flagged, limited to 100 characters. If the flagged input is longer than 100 characters, it will be truncated in the middle and replaced with ...
};

Limits

Rate Limits and Credits Remaining

To check the rate limit or credits left on an API key, make a GET request to https://openrouter.ai/api/v1/auth/key.

fetch('https://openrouter.ai/api/v1/auth/key', {
  method: 'GET',
  headers: {
    Authorization: 'Bearer $OPENROUTER_API_KEY'
  }
});

If you submit a valid API key, you should get a response of the form:

type Key = {
  data: {
    label: string;
    usage: number; // Number of credits used
    limit: number | null; // Credit limit for the key, or null if unlimited
    is_free_tier: boolean; // Whether the user has paid for credits before
    rate_limit: {
      requests: number; // Number of requests allowed...
      interval: string; // in this interval, e.g. "10s"
    };
  };
};

There are two global rate limits which apply to all requests, regardless of account status or model availability:

  1. Surge limit: By default, all users are subject to a maximum rate limit of 200 requests per second to defend against denial-of-service attacks. Contact us in Discord or using our support@ email address if you need a higher limit.

  2. Free limit: If you are using a free model variant (with an ID ending in :free), then you will be limited to 10 requests per minute and 100 requests per day.

For all other requests, rate limits are a function of the number of credits remaining on the key or account. For the credits available on your API key, you can make 1 request per credit per second up to the surge limit.

For example:

  • 0 credits → 1 req/s (minimum)
  • 5 credits → 5 req/s
  • 10 credits → 10 req/s
  • 1000 credits → 200 req/s (maximum)

If your account has a negative credit balance, you may see 402 errors, including for free models. Adding credits to put your balance above zero allows you to use those models again.

Token Limits

Some users may have too few credits on their account to make expensive requests. OpenRouter provides a way to know that before making a request to any model.

To get the maximum tokens that a user can generate and the maximum tokens allowed in their prompt, add authentication headers in your request to https://openrouter.ai/api/v1/models:

fetch('https://openrouter.ai/api/v1/models', {
  method: 'GET',
  headers: {
    Authorization: 'Bearer $OPENROUTER_API_KEY'
  }
});

Each model will include a per_request_limits property:

type Model = {
  id: string;
  pricing: {
    prompt: number;
    completion: number;
  };
  context_length: number;
  per_request_limits: {
    prompt_tokens: number;
    completion_tokens: number;
  };
};

Other Frameworks

You can find a few examples of using OpenRouter with other frameworks in this Github repository. Here are some examples:

const chat = new ChatOpenAI({
  modelName: "anthropic/claude-instant-v1",
  temperature: 0.8,
  streaming: true,
  openAIApiKey: $OPENROUTER_API_KEY,
}, {
  basePath: $OPENROUTER_BASE_URL + "/api/v1",
  baseOptions: {
    headers: {
      "HTTP-Referer": "https://yourapp.com/", // Optional, for including your app on openrouter.ai rankings.
      "X-Title": "Langchain.js Testing", // Optional. Shows in rankings on openrouter.ai.
    },
  },
});
const config = new Configuration({
  basePath: $OPENROUTER_BASE_URL + "/api/v1",
  apiKey: $OPENROUTER_API_KEY,
  baseOptions: {
    headers: {
      "HTTP-Referer": "https://yourapp.com/", // Optional, for including your app on openrouter.ai rankings.
      "X-Title": "Vercel Testing", // Optional. Shows in rankings on openrouter.ai.
    }
  }
})

const openrouter = new OpenAIApi(config)

3D Objects (beta)

OpenRouter supports text-to-3D Object generation, currently in beta. See supported media models and try a demo. To generate 3D Objects, send a POST request to https://openrouter.ai/api/v1/objects/generations

curl https://openrouter.ai/api/v1/objects/generations \\
  -H "Content-Type: application/json" \\
  -H "Authorization: Bearer $OPENROUTER_API_KEY" \\
  -H "HTTP-Referer: $YOUR_SITE_URL" \\
  -H "X-Title: $YOUR_SITE_NAME" \\
  -d '{
    "prompt": "a chair shaped like an avacado",
    "num_inference_steps": 32,
    "num_outputs": 1,
    "extension": "ply",
    "model": "openai/shap-e"
  }'

You should recieve a response of type MediaResponse:

//Each generation will contain either a base64 string or a hosted url, or both.
interface MediaOutput {
  uri?: string; //base64 string
  url?: string; //hosted url
};

interface MediaResponse {
  generations: MediaOutput[];
};