Docs

The future will bring us hundreds of language models and dozens of providers for each. How will you choose the best?

Benefit from the race to the bottom. OpenRouter finds the lowest price for each model across dozens of providers. You can also let users pay for their own models via OAuth PKCE.

Standardized API. No need to change your code when switching between models or providers.

The best models will be used the most. Evals are flawed. Instead, compare models by how often they're used, and soon, for which purposes. Chat with multiple at once in the Playground.


Quick Start

fetch("https://openrouter.ai/api/v1/chat/completions", {
  method: "POST",
  headers: {
    "Authorization": `Bearer ${OPENROUTER_API_KEY}`,
    "HTTP-Referer": `${YOUR_SITE_URL}`, // Optional, for including your app on openrouter.ai rankings.
    "X-Title": `${YOUR_SITE_NAME}`, // Optional. Shows in rankings on openrouter.ai.
    "Content-Type": "application/json"
  },
  body: JSON.stringify({
    "model": "openai/gpt-3.5-turbo",
    "messages": [
      {"role": "user", "content": "What is the meaning of life?"},
    ],
  })
});

You can also use OpenRouter with OpenAI's client API:

import OpenAI from "openai"

const openai = new OpenAI({
  baseURL: "https://openrouter.ai/api/v1",
  apiKey: $OPENROUTER_API_KEY,
  defaultHeaders: {
    "HTTP-Referer": $YOUR_SITE_URL, // Optional, for including your app on openrouter.ai rankings.
    "X-Title": $YOUR_SITE_NAME, // Optional. Shows in rankings on openrouter.ai.
  },
  // dangerouslyAllowBrowser: true,
})
async function main() {
  const completion = await openai.chat.completions.create({
    model: "openai/gpt-3.5-turbo",
    messages: [
      { role: "user", content: "Say this is a test" }
    ],
  })

  console.log(completion.choices[0].message)
}
main()

Supported Models

Model usage can be paid by users, developers, or both, and may shift in availability. You can also fetch models, prices, and limits via API.
Token counting is approximate. OpenRouter does not store prompts or completions.

If you'd like to add an open source model directly to OpenRouter, visit our Github here.

Text models

Model Name
& ID
Prompt cost
($ per 1k tokens)
Moderation

Whether content filtering is applied by OpenRouter, per the model provider's Terms of Service.


Developers should adhere to the terms of the model regardless.

Auto (best for prompt)openrouter/auto
Depending on their size, subject, and complexity, your prompts will be sent to [Mistral: Mistral Medium](/models/mistralai/mistral-medium) or [OpenAI: GPT-4 Turbo (preview)](/models/openai/gpt-4-turbo-preview). To see which model was used, visit [Activity](/activity). Pricing depends on the final model chosen.
N/A
Nous: Capybara 7B (free)nousresearch/nous-capybara-7b:free
$0
100% off
None
Mistral 7B Instruct (free)mistralai/mistral-7b-instruct:free
$0
100% off
None
MythoMist 7B (free)gryphe/mythomist-7b:free
$0
100% off
None
Toppy M 7B (free)undi95/toppy-m-7b:free
$0
100% off
None
Cinematika 7B (alpha) (free)openrouter/cinematika-7b:free
$0
100% off
None
Google: Gemma 7B (free)google/gemma-7b-it:free
$0
100% off
None
Bagel 34B v0.2jondurbin/bagel-34b
$0.003
90% off
None
Psyfighter 13Bjebcarter/psyfighter-13b
$0.001
90% off
None
Psyfighter v2 13Bkoboldai/psyfighter-13b-2
$0.001
90% off
None
Noromaid Mixtral 8x7B Instructneversleep/noromaid-mixtral-8x7b-instruct
$0.003
90% off
None
Nous: Hermes 13Bnousresearch/nous-hermes-llama2-13b
$0.00015
50% off
None
Meta: CodeLlama 34B Instructmeta-llama/codellama-34b-instruct
$0.0004
50% off
None
Phind: CodeLlama 34B v2phind/phind-codellama-34b
$0.0004
50% off
None
Neural Chat 7B v3.1intel/neural-chat-7b
$0.005
50% off
None
Mistral: Mixtral 8x7B Instructmistralai/mixtral-8x7b-instruct
$0.0003
50% off
None
Nous: Hermes 2 Mixtral 8x7B DPOnousresearch/nous-hermes-2-mixtral-8x7b-dpo
$0.0003
50% off
None
Nous: Hermes 2 Mixtral 8x7B SFTnousresearch/nous-hermes-2-mixtral-8x7b-sft
$0.0003
50% off
None
Llava 13Bhaotian-liu/llava-13b
$0.005
50% off
None
Nous: Hermes 2 Vision 7B (alpha)nousresearch/nous-hermes-2-vision-7b
$0.005
50% off
None
Meta: Llama v2 13B Chatmeta-llama/llama-2-13b-chat
$0.0001474
33% off
None
Synthia 70Bmigtissera/synthia-70b
$0.00375
25% off
None
Pygmalion: Mythalion 13Bpygmalionai/mythalion-13b
$0.001125
25% off
None
ReMM SLERP 13B 6kundi95/remm-slerp-l2-13b-6k
$0.001125
25% off
None
MythoMax 13Bgryphe/mythomax-l2-13b
$0.000225
25% off
None
Xwin 70Bxwin-lm/xwin-lm-70b
$0.00375
25% off
None
MythoMax 13B 8kgryphe/mythomax-l2-13b-8k
$0.001125
25% off
None
Goliath 120Balpindale/goliath-120b
$0.009375
25% off
None
Noromaid 20Bneversleep/noromaid-20b
$0.00225
25% off
None
MythoMist 7Bgryphe/mythomist-7b
$0.000375
25% off
None
Mancer: Weaver (alpha)mancer/weaver
$0.003375
25% off
None
Nous: Capybara 7Bnousresearch/nous-capybara-7b
$0.00018
10% off
None
Meta: CodeLlama 70B Instructcodellama/codellama-70b-instruct
$0.00081
10% off
None
OpenHermes 2 Mistral 7Bteknium/openhermes-2-mistral-7b
$0.00018
10% off
None
OpenHermes 2.5 Mistral 7Bteknium/openhermes-2.5-mistral-7b
$0.00018
10% off
None
ReMM SLERP 13Bundi95/remm-slerp-l2-13b
$0.00027
10% off
None
Toppy M 7Bundi95/toppy-m-7b
$0.00018
10% off
None
Cinematika 7B (alpha)openrouter/cinematika-7b
$0.00018
10% off
None
Yi 34B Chat01-ai/yi-34b-chat
$0.00072
10% off
None
Yi 34B (base)01-ai/yi-34b
$0.00072
10% off
None
Yi 6B (base)01-ai/yi-6b
$0.000126
10% off
None
StripedHyena Nous 7Btogethercomputer/stripedhyena-nous-7b
$0.00018
10% off
None
StripedHyena Hessian 7B (base)togethercomputer/stripedhyena-hessian-7b
$0.00018
10% off
None
Mistral: Mixtral 8x7B (base)mistralai/mixtral-8x7b
$0.00054
10% off
None
Nous: Hermes 2 Yi 34Bnousresearch/nous-hermes-yi-34b
$0.00072
10% off
None
Nous: Hermes 2 Mistral 7B DPOnousresearch/nous-hermes-2-mistral-7b-dpo
$0.00018
10% off
None
Mistral OpenOrca 7Bopen-orca/mistral-7b-openorca
$0.0001425
5% off
None
Hugging Face: Zephyr 7Bhuggingfaceh4/zephyr-7b-beta
$0.0001425
5% off
None
OpenAI: GPT-3.5 Turboopenai/gpt-3.5-turbo
$0.001
Moderated
OpenAI: GPT-3.5 Turbo 16kopenai/gpt-3.5-turbo-0125
$0.0005
Moderated
OpenAI: GPT-3.5 Turbo 16kopenai/gpt-3.5-turbo-16k
$0.003
Moderated
OpenAI: GPT-4 Turbo (preview)openai/gpt-4-turbo-preview
$0.01
Moderated
OpenAI: GPT-4openai/gpt-4
$0.03
Moderated
OpenAI: GPT-4 32kopenai/gpt-4-32k
$0.06
Moderated
OpenAI: GPT-4 Vision (preview)openai/gpt-4-vision-preview
$0.01
Moderated
OpenAI: GPT-3.5 Turbo Instructopenai/gpt-3.5-turbo-instruct
$0.0015
Moderated
Google: PaLM 2 Chatgoogle/palm-2-chat-bison
$0.00025
None
Google: PaLM 2 Code Chatgoogle/palm-2-codechat-bison
$0.00025
None
Google: PaLM 2 Chat 32kgoogle/palm-2-chat-bison-32k
$0.00025
None
Google: PaLM 2 Code Chat 32kgoogle/palm-2-codechat-bison-32k
$0.00025
None
Google: Gemini Pro (preview)google/gemini-pro
$0.000125
None
Google: Gemini Pro Vision (preview)google/gemini-pro-vision
$0.000125
None
Perplexity: PPLX 70B Onlineperplexity/pplx-70b-online
$0
None
Perplexity: PPLX 7B Onlineperplexity/pplx-7b-online
$0
None
Perplexity: PPLX 7B Chatperplexity/pplx-7b-chat
$0.00007
None
Perplexity: PPLX 70B Chatperplexity/pplx-70b-chat
$0.0007
None
Perplexity: Sonar 7Bperplexity/sonar-small-chat
$0.00007
None
Perplexity: Sonar 8x7Bperplexity/sonar-medium-chat
$0.0006
None
Perplexity: Sonar 7B Onlineperplexity/sonar-small-online
$0
None
Perplexity: Sonar 8x7B Onlineperplexity/sonar-medium-online
$0
None
Meta: Llama v2 70B Chatmeta-llama/llama-2-70b-chat
$0.0007
None
Nous: Capybara 34Bnousresearch/nous-capybara-34b
$0.0007
None
Airoboros 70Bjondurbin/airoboros-l2-70b
$0.0007
None
Chronos Hermes 13B v2austism/chronos-hermes-13b
$0.00022
None
Mistral 7B Instructmistralai/mistral-7b-instruct
$0.00013
None
OpenChat 3.5openchat/openchat-7b
$0.00013
None
lzlv 70Blizpreciatior/lzlv-70b-fp16-hf
$0.0007
None
Dolphin 2.6 Mixtral 8x7B šŸ¬cognitivecomputations/dolphin-mixtral-8x7b
$0.00027
None
RWKV v5 World 3Brwkv/rwkv-5-world-3b
$0
None
RWKV v5 3B AI Townrecursal/rwkv-5-3b-ai-town
$0
None
RWKV v5: Eagle 7Brecursal/eagle-7b
$0
None
Google: Gemma 7Bgoogle/gemma-7b-it
$0.00013
None
Anthropic: Claude v2anthropic/claude-2
$0.008
Moderated
Anthropic: Claude v2.1anthropic/claude-2.1
$0.008
Moderated
Anthropic: Claude v2.0anthropic/claude-2.0
$0.008
Moderated
Anthropic: Claude Instant v1anthropic/claude-instant-1
$0.0008
Moderated
Anthropic: Claude Instant v1.2anthropic/claude-instant-1.2
$0.0008
Moderated
Anthropic: Claude v2 (experimental)anthropic/claude-2:beta
$0.008
None
Anthropic: Claude v2.1 (experimental)anthropic/claude-2.1:beta
$0.008
None
Anthropic: Claude v2.0 (experimental)anthropic/claude-2.0:beta
$0.008
None
Anthropic: Claude Instant v1 (experimental)anthropic/claude-instant-1:beta
$0.0008
None
Hugging Face: Zephyr 7B (free)huggingfaceh4/zephyr-7b-beta:free
$0
None
OpenChat 3.5 (free)openchat/openchat-7b:free
$0
None
Mistral: Mistral Tinymistralai/mistral-tiny
$0.00025
None
Mistral: Mistral Smallmistralai/mistral-small
$0.002
None
Mistral: Mistral Mediummistralai/mistral-medium
$0.0027
None
Mistral: Mistral Largemistralai/mistral-large
$0.008
None


Media models
More coming soon. Learn about making 3D object requests in our Discord

OpenAI: Shap-eopenai/shap-e$0.01 / 32 steps

Note: Different models tokenize text in different ways. Some models break up text into chunks of multiple characters (GPT, Claude, Llama, etc) while others tokenize by character (PaLM). This means that the number of tokens may vary depending on the model.


Fallback Models

OpenRouter allows you to automatically try other models if the primary model is down, rate-limited, or refuses to reply due to content moderation required by the provider:

{
  "models": ["anthropic/claude-2.1", "gryphe/mythomax-l2-13b"],
  "route": "fallback",
  ... // Other params
}

If the model you selected returns an error, OpenRouter will try to use the fallback model instead. If the fallback model is down or returns an error, OpenRouter will return that error.

By default, any error can trigger the use of a fallback model, including context length validation errors, moderation flags for filtered models, rate-limiting, and downtime.

Requests are priced using the model that was used, which will be returned in the model attribute of the response body.

If no fallback model is specified but route: "fallback" is still included, OpenRouter will try the most appropriate open-source model available, with pricing less than the primary model (or very close to it).


OAuth PKCE

Users can connect to OpenRouter in one click using Proof Key for Code Exchange (PKCE). Here's an example, and here's a step-by-step:

  1. Send your user to https://openrouter.ai/auth?callback_url=YOUR_SITE_URL

    • You can optionally include a code_challenge (random password up to 256 digits) for extra security.
    • For maximum security, we recommend also setting code_challenge_method to S256, and then setting code_challenge to the base64 encoding of the sha256 hash of code_verifier, which you will submit in Step 2. More info in Auth0's docs.
  2. Once logged in, they'll be redirected back to your site with a code in the URL. Make an API call (can be frontend or backend) to exchange the code for a user-controlled API key. And that's it for PKCE!

    • Look for the code query parameter, e.g. ?code=....
fetch("https://openrouter.ai/api/v1/auth/keys", {
  method: 'POST',
  body: JSON.stringify({
    code: $CODE_FROM_QUERY_PARAM,
    code_verifier: $CODE_VERIFIER // Only needed if you sent a code_challenge in Step 1
  })
});
  1. A fresh API key will be in the result under "key". Store it securely and make OpenAI-style requests (supports streaming as well):
fetch("https://openrouter.ai/api/v1/chat/completions", {
  method: "POST",
  headers: {
    "Authorization": `Bearer ${OPENROUTER_API_KEY}`,
    "HTTP-Referer": `${YOUR_SITE_URL}`, // Optional, for including your app on openrouter.ai rankings.
    "X-Title": `${YOUR_SITE_NAME}`, // Optional. Shows in rankings on openrouter.ai.
    "Content-Type": "application/json"
  },
  body: JSON.stringify({
    "model": "anthropic/claude-2",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Hello!"},
    ],
  })
});

You can use JavaScript or any server-side framework, like Streamlit. The linked example shows multiple models and file Q&A.


API Keys

Users or developers can cover model costs with normal API keys. This allows you to use curl or the OpenAI SDK directly with OpenRouter. Just create an API key, set the api_base, and optionally set a referrer header to make your app discoverable to others on OpenRouter.

Note: API keys on OpenRouter are more powerful than keys used directly for model APIs. They allow users to set credit limits for apps, and they can be used in OAuth flows.

Example code:

import openai

openai.api_base = "https://openrouter.ai/api/v1"
openai.api_key = $OPENROUTER_API_KEY

response = openai.ChatCompletion.create(
  model="openai/gpt-3.5-turbo",
  messages=[...],
  headers={
    "HTTP-Referer": $YOUR_SITE_URL, # Optional, for including your app on openrouter.ai rankings.
    "X-Title": $YOUR_APP_NAME, # Optional. Shows in rankings on openrouter.ai.
  },
)

reply = response.choices[0].message

To stream with Python, see this example from OpenAI.


Requests

OpenRouter's request and response schemas are very similar to the OpenAI Chat API, with a few small differences. At a high level, OpenRouter normalizes the schema across models and providers so you only need to learn one.

Request Body

Here's the request schema as a TypeScript type. This will be the body of your POST request to the /api/v1/chat/completions endpoint (see the quick start above for an example).

// Definitions of subtypes are below
type Request = {
  // Either "messages" or "prompt" is required
  messages?: Message[];
  prompt?: string;

  // If "model" is unspecified, uses the user's default
  model?: string; // See "Supported Models" section

  response_format?: { type: 'text' | 'json_object' }; // OpenAI only

  seed?: number; // OpenAI only
  stop?: string | string[];
  stream?: boolean; // Enable streaming

  // See LLM Parameters (openrouter.ai/docs#llm-parameters)
  max_tokens?: number; // Range: [1, context_length)
  temperature?: number; // Range: [0, 2]
  top_p?: number; // Range: (0, 1]
  top_k?: number; // Range: [1, Infinity) Not available for OpenAI models
  frequency_penalty?: number; // Range: [-2, 2]
  presence_penalty?: number; // Range: [-2, 2]
  repetition_penalty?: number; // Range: (0, 2]

  // Function-calling
  // Only natively suported by OpenAI models. For others, we submit
  // a YAML-formatted string with these tools at the end of the prompt.
  tools?: Tool[];
  tool_choice?: ToolChoice;

  // Additional optional parameters
  logit_bias?: { [key: number]: number }; // OpenAI only

  // OpenRouter-only parameters
  transforms?: string[]; // See "Prompt Transforms" section
  models?: string[]; // See "Fallback Models" section
  route?: 'fallback'; // See "Fallback Models" section
};

// Subtypes:

type TextContent = {
  type: 'text';
  text: string;
};

type ImageContentPart = {
  type: 'image_url';
  image_url: {
    url: string; // URL or base64 encoded image data
    detail?: string; // Optional, defaults to 'auto'
  };
};

type ContentPart = TextContent | ImageContentPart;

type Message = {
  role: 'user' | 'assistant' | 'system' | 'tool';
  // ContentParts are only for the 'user' role:
  content: string | ContentPart[];
  // If "name" is included, it will be prepended like this
  // for non-OpenAI models: `{name}: {content}`
  name?: string;
};

type FunctionDescription = {
  description?: string;
  name: string;
  parameters: object; // JSON Schema object
};

type Tool = {
  type: 'function';
  function: FunctionDescription;
};

type ToolChoice =
  | 'none'
  | 'auto'
  | {
      type: 'function';
      function: {
        name: string;
      };
    };

Request Headers

OpenRouter allows you to specify an optional HTTP-Referer header to identify your app and make it discoverable to users on openrouter.ai. You can also include an optional X-Title header to set or modify the title of your app. Example:

fetch("https://openrouter.ai/api/v1/chat/completions", {
  method: "POST",
  headers: {
    "Authorization": `Bearer ${OPENROUTER_API_KEY}`,
    "HTTP-Referer": `${YOUR_SITE_URL}`, // Optional, for including your app on openrouter.ai rankings.
    "X-Title": `${YOUR_SITE_NAME}`, // Optional. Shows in rankings on openrouter.ai.
    "Content-Type": "application/json"
  },
  body: JSON.stringify({
    "messages": [
      {"role": "user", "content": "Who are you?"},
    ],
  })
});

Model routing: If the model parameter is omitted, the user or payer's default is used. Otherwise, remember to select a value for model from the supported models or API, and include the organization prefix. OpenRouter will select the least expensive and best GPUs available to serve the request, and fall back to other providers or GPUs if it receives a 5xx response code or if you are rate-limited.

Streaming: Server-Sent Events (SSE) are supported as well, to enable streaming for all models. Simply send stream: true in your request body. The SSE stream will occasionally contain a "comment" payload, which you should ignore (noted below).

Non-standard parameters: If the chosen model doesn't support a request parameter (such as logit_bias in non-OpenAI models, or top_k for OpenAI), then the parameter is ignored. The rest are forwarded to the underlying model API.

Assistant Prefill: OpenRouter supports asking models to complete a partial response. This can be useful for guiding models to respond in a certain way.

To use this features, simply include a message with role: "assistant" at the end of your messages array. Example:

fetch("https://openrouter.ai/api/v1/chat/completions", {
  method: "POST",
  headers: {
    "Authorization": `Bearer ${OPENROUTER_API_KEY}`,
    "HTTP-Referer": `${YOUR_SITE_URL}`, // Optional, for including your app on openrouter.ai rankings.
    "X-Title": `${YOUR_SITE_NAME}`, // Optional. Shows in rankings on openrouter.ai.
    "Content-Type": "application/json"
  },
  body: JSON.stringify({
    "messages": [
      {"role": "user", "content": "Who are you?"},
      {"role": "assistant", "content": "I'm not sure, but my best guess is"},
    ],
  })
});

Responses

Responses are largely consistent with the OpenAI Chat API. This means that choices is always an array, even if the model only returns one completion. Each choice will contain a delta property if a stream was requested and a message property otherwise. This makes it easier to use the same code for all models.

At a high level, OpenRouter normalizes the schema across models and providers so you only need to learn one.

Response Body

Note that finish_reason will vary depending on the model provider. The model property tells you which model was used inside the underlying API.

Here's the response schema as a TypeScript type:

// Definitions of subtypes are below

type Response = {
  id: string;
  // Depending on whether you set "stream" to "true" and
  // whether you passed in "messages" or a "prompt", you
  // will get a different output shape
  choices: (NonStreamingChoice | StreamingChoice | NonChatChoice | Error)[];
  created: number; // Unix timestamp
  model: string;
  object: 'chat.completion';
  // For non-streaming responses only. For streaming responses,
  // see "Querying Cost and Stats" below.
  usage?: {
    completion_tokens: number;
    prompt_tokens: number;
    total_tokens: number;
  };
};

// Subtypes:

type NonChatChoice = {
  finish_reason: string | null;
  text: string;
};

type NonStreamingChoice = {
  finish_reason: string | null; // Depends on the model. Ex: 'stop' | 'length' | 'content_filter' | 'tool_calls' | 'function_call'
  message: {
    content: string | null;
    role: string;
    tool_calls?: ToolCall[];
    // Deprecated, replaced by tool_calls
    function_call?: FunctionCall;
  };
};

type StreamingChoice = {
  finish_reason: string | null;
  delta: {
    content: string | null;
    role?: string;
    tool_calls?: ToolCall[];
    // Deprecated, replaced by tool_calls
    function_call?: FunctionCall;
  };
};

type Error = {
  code: number; // See "Error Handling" section
  message: string;
};

type FunctionCall = {
  name: string;
  arguments: string; // JSON format arguments
};

type ToolCall = {
  id: string;
  type: 'function';
  function: FunctionCall;
};

Here's an example:

{
  "id": "gen-xxxxxxxxxxxxxx",
  "choices": [
    {
      "finish_reason": "stop", // Different models provide different reasons here
      "message": {
        // will be "delta" if streaming
        "role": "assistant",
        "content": "Hello there!"
      }
    }
  ],
  "model": "openai/gpt-3.5-turbo" // Could also be "anthropic/claude-2.1", etc, depending on the "model" that ends up being used
}

Querying Cost and Stats

You can use the returned id to query for the generation stats (including token counts and cost) after the request is complete. This is how you can get the cost and tokens for all models and requests, streaming and non-streaming.

const generation = await fetch(
  "https://openrouter.ai/api/v1/generation?id=$GENERATION_ID",
  { headers }
)

await generation.json()
// OUTPUT:
{
  data: {
    "id": "gen-nNPYi0ZB6GOK5TNCUMHJGgXo",
    "model": "openai/gpt-4-32k",
    "streamed": false,
    "generation_time": 2,
    "created_at": "2023-09-02T20:29:18.574972+00:00",
    "tokens_prompt": 24,
    "tokens_completion": 29,
    "native_tokens_prompt": 24,
    "native_tokens_completion": 29,
    "num_media_prompt": null,
    "num_media_completion": null,
    "origin": "https://localhost:47323/",
    "usage": 0.00492
  }
};

For SSE streams, we occasionally need to send an SSE comment to indicate that OpenRouter is processing your request. This helps prevent connections from timing out. The comment will look like this:

: OPENROUTER PROCESSING

Comment payload can be safely ignored per the SSE specs. However, you can leverage it to improve UX as needed, e.g. by showing a dynamic loading indicator.

Some SSE client implementations might not parse the payload according to spec, which leads to an uncaught error when you JSON.stringify the non-JSON payloads. We recommend the following clients:


LLM Parameters

temperature

  • Type: float

  • Range: 0.0 to 2.0

  • Default: 1.0

  • Explainer Video: Watch

This setting influences the variety in the model's responses. Lower values lead to more predictable and typical responses, while higher values encourage more diverse and less common responses. At 0, the model always gives the same response for a given input.

top_p

  • Type: float

  • Range: 0.0 to 1.0

  • Default: 1.0

  • Explainer Video: Watch

This setting limits the model's choices to a percentage of likely tokens: only the top tokens whose probabilities add up to P. A lower value makes the model's responses more predictable, while the default setting allows for a full range of token choices. Think of it like a dynamic Top-K.

top_k

  • Type: integer

  • Range: 0 or above

  • Default: 0

  • Explainer Video: Watch

This limits the model's choice of tokens at each step, making it choose from a smaller set. A value of 1 means the model will always pick the most likely next token, leading to predictable results. By default this setting is disabled, making the model to consider all choices.

frequency_penalty

  • Type: float

  • Range: -2.0 to 2.0

  • Default: 0.0

  • Explainer Video: Watch

This setting aims to control the repetition of tokens based on how often they appear in the input. It tries to use less frequently those tokens that appear more in the input, proportional to how frequently they occur. Token penalty scales with the number of occurrences. Negative values will encourage token reuse.

presence_penalty

  • Type: float

  • Range: -2.0 to 2.0

  • Default: 0.0

  • Explainer Video: Watch

Adjusts how often the model repeats specific tokens already used in the input. Higher values make such repetition less likely, while negative values do the opposite. Token penalty does not scale with the number of occurrences. Negative values will encourage token reuse.

repetition_penalty

  • Type: float

  • Range: 0.0 to 2.0

  • Default: 1.0

  • Explainer Video: Watch

Helps to reduce the repetition of tokens from the input. A higher value makes the model less likely to repeat tokens, but too high a value can make the output less coherent (often with run-on sentences that lack small words). Token penalty scales based on original token's probability.

min_p

  • Type: float

  • Range: 0.0 to 1.0

  • Default: 0.0

Represents the minimum probability for a token to be considered, relative to the probability of the most likely token. (The value changes depending on the confidence level of the most probable token.) If your Min-P is set to 0.1, that means it will only allow for tokens that are at least 1/10th as probable as the best possible option.

top_a

  • Type: float

  • Range: 0.0 to 1.0

  • Default: 0.0

Consider only the top tokens with "sufficiently high" probabilities based on the probability of the most likely token. Think of it like a dynamic Top-P. A lower Top-A value focuses the choices based on the highest probability token but with a narrower scope. A higher Top-A value does not necessarily affect the creativity of the output, but rather refines the filtering process based on the maximum probability.

seed

  • Type: integer

If specified, the inferencing will sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed for some models.

max_tokens

  • Type: integer

  • Range: 1 or above

This sets the upper limit for the number of tokens the model can generate in response. It won't produce more than this limit. The maximum value is the context length minus the prompt length.

logit_bias

  • Type: map

Accepts a JSON object that maps tokens (specified by their token ID in the tokenizer) to an associated bias value from -100 to 100. Mathematically, the bias is added to the logits generated by the model prior to sampling. The exact effect will vary per model, but values between -1 and 1 should decrease or increase likelihood of selection; values like -100 or 100 should result in a ban or exclusive selection of the relevant token.

response_format

  • Type: map

Forces the model to produce specific output format. Setting to { "type": "json_object" } enables JSON mode, which guarantees the message the model generates is valid JSON. Note: For open-source models, an optional JSON schema can be provided as response_format = {"type": "json_object", "schema": <json_schema>}. For OpenAI models, when using JSON mode, you must also instruct the model to produce JSON yourself via a system or user message.

stop

  • Type: array

Stop generation immediately if the model encounter any token specified in the stop array.


Prompt Transforms

OpenRouter has a simple rule for choosing between sending a prompt and sending a list of ChatML messages:

  • Choose messages if you want to have OpenRouter apply a recommended instruct template to your prompt, depending on which model serves your request. Available instruct modes include:
  • Choose prompt if you want to send a custom prompt to the model. This is useful if you want to use a custom instruct template or maintain full control over the prompt submitted to the model.

To help with prompts that exceed the maximum context size of a model, OpenRouter supports a custom parameter called transforms:

{
  transforms: ["middle-out"], // Compress prompts > context size. This is the default for all models.
  messages: [...], // "prompt" works as well
  model // Works with any model
}

The transforms param is an array of strings that tell OpenRouter to apply a series of transformations to the prompt before sending it to the model. Transformations are applied in-order. Available transforms are:

  • middle-out: compress prompts and message chains to the context size. This helps users extend conversations in part because LLMs pay significantly less attention to the middle of sequences anyway. Works by compressing or removing messages in the middle of the prompt.

Note: All OpenRouter models default to using middle-out, unless you exclude this transform by e.g. setting transforms: [] in the request body.


Error Handling

For errors, OpenRouter returns a JSON response with the following shape:

type ErrorResponse = {
  error: {
    code: number;
    message: string;
    metadata?: Record<string, unknown>;
  };
};

The HTTP Response will have the same status code as error.code, forming a request error if:

  • Your original request is invalid
  • Your API key/account is out of credits

Otherwise, the returned HTTP response status will be 200 and any error occured while the LLM is producing the output will be emitted in the response body or as an SSE data event.

Example code for printing errors in JavaScript:

const request = await fetch('https://openrouter.ai/...');
console.log(request.status); // Will be an error code unless the model started processing your request
const response = await request.json();
console.error(response.error?.status); // Will be an error code
console.error(response.error?.message);

Error Codes

  • 400: Bad Request (invalid or missing params, CORS)
  • 401: Invalid credentials (OAuth session expired, disabled/invalid API key)
  • 402: Your account or API key has insufficient credits. Add more credits and retry the request.
  • 403: Your chosen model requires moderation and your input was flagged
  • 408: Your request timed out
  • 429: You are being rate limited
  • 502: Your chosen model is down or we received an invalid response from it
  • 503: There is no available model provider that meets your routing requirements

Moderation Errors

If your input was flagged, the error metadata will contain information about the issue. The shape of the metadata is as follows:

type ModerationErrorMetadata = {
  reasons: string[]; // Why your input was flagged
  flagged_input: string; // The text segment that was flagged, limited to 100 characters. If the flagged input is longer than 100 characters, it will be truncated in the middle and replaced with ...
};

Limits

Rate Limits and Credits Remaining

To check the rate limit or credits left on an API key, make a GET request to https://openrouter.ai/api/v1/auth/key.

fetch('https://openrouter.ai/api/v1/auth/key', {
  method: 'GET',
  headers: {
    Authorization: 'Bearer $OPENROUTER_API_KEY'
  }
});

If you submit a valid API key, you should get a response of the form:

type Key = {
  data: {
    label: string;
    usage: number; // Number of credits used
    limit: number | null; // Credit limit for the key, or null if unlimited
    is_free_tier: boolean; // Whether the user has paid for credits before
    rate_limit: {
      requests: number; // Number of requests allowed...
      interval: string; // in this interval, e.g. "10s"
    };
  };
};

There are two global rate limits which apply to all requests, regardless of account status or model availability:

  1. Surge limit: By default, all users are subject to a maximum rate limit of 200 requests per second to defend against denial-of-service attacks. Contact us in Discord or using our support@ email address if you need a higher limit.

  2. Free limit: If you are using a free model (100% discounted, or with an ID ending in :free), then you will be limited to 10 requests per minute.

For all other requests, rate limits are a function of the number of credits remaining on the key or account. For the credits available on your API key, you can make 1 request per credit per second up to the surge limit.

For example:

  • 0 credits ā†’ 1 req/s (minimum)
  • 5 credits ā†’ 5 req/s
  • 10 credits ā†’ 10 req/s
  • 1000 credits ā†’ 200 req/s (maximum)

If your account has a negative credit balance, you may see 402 errors, including for free models. Adding credits to put your balance above zero allows you to use those models again.

Token Limits

Some users may have too few credits on their account to make expensive requests. OpenRouter provides a way to know that before making a request to any model.

To get the maximum tokens that a user can generate and the maximum tokens allowed in their prompt, add authentication headers in your request to https://openrouter.ai/api/v1/models:

fetch('https://openrouter.ai/api/v1/models', {
  method: 'GET',
  headers: {
    Authorization: 'Bearer $OPENROUTER_API_KEY'
  }
});

Each model will include a per_request_limits property:

type Model = {
  id: string;
  pricing: {
    prompt: number;
    completion: number;
  };
  context_length: number;
  per_request_limits: {
    prompt_tokens: number;
    completion_tokens: number;
  };
};

Other Frameworks

You can find a few examples of using OpenRouter with other frameworks in this Github repository. Here are some examples:

const chat = new ChatOpenAI({
  modelName: "anthropic/claude-instant-v1",
  temperature: 0.8,
  streaming: true,
  openAIApiKey: $OPENROUTER_API_KEY,
}, {
  basePath: $OPENROUTER_BASE_URL + "/api/v1",
  baseOptions: {
    headers: {
      "HTTP-Referer": "https://yourapp.com/", // Optional, for including your app on openrouter.ai rankings.
      "X-Title": "Langchain.js Testing", // Optional. Shows in rankings on openrouter.ai.
    },
  },
});
const config = new Configuration({
  basePath: $OPENROUTER_BASE_URL + "/api/v1",
  apiKey: $OPENROUTER_API_KEY,
  baseOptions: {
    headers: {
      "HTTP-Referer": "https://yourapp.com/", // Optional, for including your app on openrouter.ai rankings.
      "X-Title": "Vercel Testing", // Optional. Shows in rankings on openrouter.ai.
    }
  }
})

const openrouter = new OpenAIApi(config)

3D Objects (beta)

OpenRouter supports text-to-3D Object generation, currently in beta. See supported media models and try a demo. To generate 3D Objects, send a POST request to https://openrouter.ai/api/v1/objects/generations

curl https://openrouter.ai/api/v1/objects/generations \\
  -H "Content-Type: application/json" \\
  -H "Authorization: Bearer $OPENROUTER_API_KEY" \\
  -H "HTTP-Referer: $YOUR_SITE_URL" \\
  -H "X-Title: $YOUR_SITE_NAME" \\
  -d '{
    "prompt": "a chair shaped like an avacado",
    "num_inference_steps": 32,
    "num_outputs": 1,
    "extension": "ply",
    "model": "openai/shap-e"
  }'

You should recieve a response of type MediaResponse:

//Each generation will contain either a base64 string or a hosted url, or both.
interface MediaOutput {
  uri?: string; //base64 string
  url?: string; //hosted url
};

interface MediaResponse {
  generations: MediaOutput[];
};