For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
ModelsChatRankingsDocs
DocsAPI ReferenceClient SDKsAgent SDKCookbook
DocsAPI ReferenceClient SDKsAgent SDKCookbook
  • Overview
    • Quickstart
    • Principles
    • Models
    • Stripe Projects
    • FAQ
    • Report Feedback
  • Models & Routing
    • Model Fallbacks
    • Provider Selection
    • Auto Exacto
    • Private Models
      • Auto Router
      • Body Builder
      • Free Models Router
      • Latest Model Resolution
      • Pareto Router
      • Fusion Router
  • Features
    • Workspaces
    • Presets
    • Response Caching
    • Tool Calling
    • Structured Outputs
    • Message Transforms
    • Zero Completion Insurance
    • ZDR
    • App Attribution
    • Service Tiers
    • Sovereign AI
    • Router Metadata
    • Input & Output Logging
LogoLogo
ModelsChatRankingsDocs
On this page
  • Overview
  • Usage
  • Response
  • How It Works
  • Session Stickiness
  • Example with session_id
  • Why It Matters for the Auto Router
  • Supported Models
  • Configuring Allowed Models
  • Via API Request
  • Via Settings UI
  • Pattern Syntax
  • Cost / Quality Tradeoff
  • Via API Request
  • Via Settings UI
  • Pricing
  • Use Cases
  • Limitations
  • Related
Models & RoutingRouters

Auto Router

Automatically select the best model for your prompt
Was this page helpful?
Previous

Body Builder

Generate multiple parallel API requests from natural language
Next
Built with

The Auto Router (openrouter/auto) automatically selects the best model for your prompt, powered by NotDiamond.

Overview

Instead of manually choosing a model, let the Auto Router analyze your prompt and select the optimal model from a curated set of high-quality options. The router considers factors like prompt complexity, task type, and model capabilities.

Usage

Set your model to openrouter/auto:

1import { OpenRouter } from '@openrouter/sdk';
2
3const openRouter = new OpenRouter({
4 apiKey: '<OPENROUTER_API_KEY>',
5});
6
7const completion = await openRouter.chat.send({
8 model: 'openrouter/auto',
9 messages: [
10 {
11 role: 'user',
12 content: 'Explain quantum entanglement in simple terms',
13 },
14 ],
15});
16
17console.log(completion.choices[0].message.content);
18// Check which model was selected
19console.log('Model used:', completion.model);

Response

The response includes the model field showing which model was actually used:

1{
2 "id": "gen-...",
3 "model": "anthropic/claude-sonnet-4.5", // The model that was selected
4 "choices": [
5 {
6 "message": {
7 "role": "assistant",
8 "content": "..."
9 }
10 }
11 ],
12 "usage": {
13 "prompt_tokens": 15,
14 "completion_tokens": 150,
15 "total_tokens": 165
16 }
17}

How It Works

  1. Prompt Analysis: Your prompt is analyzed by NotDiamond’s routing system
  2. Model Selection: The optimal model is selected based on the task requirements
  3. Request Forwarding: Your request is forwarded to the selected model
  4. Response Tracking: The response includes metadata showing which model was used

Session Stickiness

The Auto Router pins both the selected model and provider so that subsequent requests in the same conversation route to the same place. This ensures consistent behavior within a conversation and maximizes prompt cache hits.

Stickiness applies at two levels:

  • Implicit (automatic): OpenRouter derives a conversation fingerprint from your messages (hashing the first system message and first user message). Once the provider reports prompt cache usage, the model and provider are pinned for that conversation. No configuration needed.
  • Explicit (session_id): When you include a session_id, stickiness kicks in on the first successful response — even before cache usage is observed. This is recommended for multi-turn conversations and agent workflows where you want consistent routing from the start.

In both cases, the cache expires after 5 minutes of inactivity. Each successful request resets the timer. If the cached provider returns an error, the cache is not updated, allowing the next request to be re-routed.

For full details on how sticky routing works, cache key granularity, and the x-session-id header, see Provider Sticky Routing.

Example with session_id

1const completion = await openRouter.chat.send({
2 model: 'openrouter/auto',
3 session_id: 'my-conversation-123',
4 messages: [
5 {
6 role: 'user',
7 content: 'Explain quantum entanglement',
8 },
9 ],
10});
11
12// Subsequent requests with the same session_id will use the same model and provider
13const followUp = await openRouter.chat.send({
14 model: 'openrouter/auto',
15 session_id: 'my-conversation-123',
16 messages: [
17 { role: 'user', content: 'Explain quantum entanglement' },
18 { role: 'assistant', content: completion.choices[0].message.content ?? '' },
19 { role: 'user', content: 'Now explain it to a 5-year-old' },
20 ],
21});

Why It Matters for the Auto Router

Unlike using a fixed model, the Auto Router selects a different model each time based on your prompt. Session stickiness is especially important here because it also pins the model selection — not just the provider. Without it, you could get different models on each turn of a conversation, leading to inconsistent behavior and wasted prompt cache.

Supported Models

The Auto Router selects from a curated set of high-quality models including:

Model slugs change as new versions are released. The examples below are current as of December 4, 2025. Check the models page for the latest available models.

  • Claude Sonnet 4.5 (anthropic/claude-sonnet-4.5)
  • Claude Opus 4.5 (anthropic/claude-opus-4.5)
  • GPT-5.1 (openai/gpt-5.1)
  • Gemini 3.1 Pro (google/gemini-3.1-pro-preview)
  • DeepSeek 3.2 (deepseek/deepseek-v3.2)
  • And other top-performing models

The exact model pool may be updated as new models become available.

Configuring Allowed Models

You can restrict which models the Auto Router can select from using the plugins parameter. This is useful when you want to limit routing to specific providers or model families.

Via API Request

Use wildcard patterns to filter models. For example, anthropic/* matches all Anthropic models:

1const completion = await openRouter.chat.send({
2 model: 'openrouter/auto',
3 messages: [
4 {
5 role: 'user',
6 content: 'Explain quantum entanglement',
7 },
8 ],
9 plugins: [
10 {
11 id: 'auto-router',
12 allowed_models: ['anthropic/*', 'openai/gpt-5.1'],
13 },
14 ],
15});

Via Settings UI

You can also configure default allowed models in your Plugin Settings:

  1. Navigate to Settings > Plugins
  2. Find Auto Router and click the configure button
  3. Enter model patterns (one per line)
  4. Save your settings

These defaults apply to all your API requests unless overridden per-request.

Pattern Syntax

PatternMatches
anthropic/*All Anthropic models
openai/gpt-5*All GPT-5 variants
google/*All Google models
openai/gpt-5.1Exact match only
*/claude-*Any provider with claude in model name

When no patterns are configured, the Auto Router uses all supported models.

Cost / Quality Tradeoff

Control how aggressively the Auto Router optimizes for cost vs. quality using the cost_quality_tradeoff parameter (integer, 0–10):

  • 0 = pure quality — always picks the most capable model regardless of cost
  • 10 = maximize for cost — cheapest model wins
  • Intermediate values blend quality and cost signals continuously

The default is 7, which balances cost savings with strong output quality.

Via API Request

1const completion = await openRouter.chat.send({
2 model: 'openrouter/auto',
3 messages: [
4 {
5 role: 'user',
6 content: 'Summarize this paragraph',
7 },
8 ],
9 plugins: [
10 {
11 id: 'auto-router',
12 cost_quality_tradeoff: 3, // Favor quality over cost
13 },
14 ],
15});

Via Settings UI

You can also set a default tradeoff in your Plugin Settings under Auto Router. The per-request value overrides this default.

Pricing

You pay the standard rate for whichever model is selected. There is no additional fee for using the Auto Router.

Use Cases

  • General-purpose applications: When you don’t know what types of prompts users will send
  • Cost optimization: Let the router choose efficient models for simpler tasks
  • Quality optimization: Ensure complex prompts get routed to capable models
  • Experimentation: Discover which models work best for your use case

Limitations

  • The router requires messages format (not prompt)
  • Streaming is supported
  • All standard OpenRouter features (tool calling, etc.) work with the selected model

Related

  • Body Builder - Generate multiple parallel API requests
  • Latest Model Resolution - Always target the newest version of a model family
  • Model Fallbacks - Configure fallback models
  • Provider Selection - Control which providers are used