Skip to content
  • Status
  • Announcements
  • Docs
  • Support
  • About
  • Partners
  • Enterprise
  • Careers
  • Pricing
  • Privacy
  • Terms
  •  
  • © 2025 OpenRouter, Inc

    Distillable AI Models

    Distillable models explicitly allow their outputs to be used for training and distillation. Use them as teacher models to build training datasets, create smaller specialized models, or run compliant distillation pipelines. OpenRouter tracks distillation permissions, so you can confidently use these outputs in your distillation workflows.

    Distillable Models on OpenRouter

    Favicon for deepseek

    DeepSeek: DeepSeek V3.2

    356B tokens

    DeepSeek-V3.2 is a large language model designed to harmonize high computational efficiency with strong reasoning and agentic tool-use performance. It introduces DeepSeek Sparse Attention (DSA), a fine-grained sparse attention mechanism that reduces training and inference cost while preserving quality in long-context scenarios. A scalable reinforcement learning post-training framework further improves reasoning, with reported performance in the GPT-5 class, and the model has demonstrated gold-medal results on the 2025 IMO and IOI. V3.2 also uses a large-scale agentic task synthesis pipeline to better integrate reasoning into tool-use settings, boosting compliance and generalization in interactive environments.

    Users can control the reasoning behaviour with the reasoning enabled boolean. Learn more in our docs

    by deepseek164K context$0.25/M input tokens$0.38/M output tokens
    Favicon for mistralai

    Mistral: Devstral 2 2512 (free)

    120B tokens

    Devstral 2 is a state-of-the-art open-source model by Mistral AI specializing in agentic coding. It is a 123B-parameter dense transformer model supporting a 256K context window.

    Devstral 2 supports exploring codebases and orchestrating changes across multiple files while maintaining architecture-level context. It tracks framework dependencies, detects failures, and retries with corrections—solving challenges like bug fixing and modernizing legacy systems. The model can be fine-tuned to prioritize specific languages or optimize for large enterprise codebases. It is available under a modified MIT license.

    by mistralai262K context$0/M input tokens$0/M output tokens
    Favicon for deepseek

    DeepSeek: DeepSeek V3 0324

    109B tokens

    DeepSeek V3, a 685B-parameter, mixture-of-experts model, is the latest iteration of the flagship chat model family from the DeepSeek team.

    It succeeds the DeepSeek V3 model and performs really well on a variety of tasks.

    by deepseek164K context$0.20/M input tokens$0.88/M output tokens
    Favicon for qwen

    Qwen: Qwen3 235B A22B Instruct 2507

    91.5B tokens

    Qwen3-235B-A22B-Instruct-2507 is a multilingual, instruction-tuned mixture-of-experts language model based on the Qwen3-235B architecture, with 22B active parameters per forward pass. It is optimized for general-purpose text generation, including instruction following, logical reasoning, math, code, and tool usage. The model supports a native 262K context length and does not implement "thinking mode" (<think> blocks).

    Compared to its base variant, this version delivers significant gains in knowledge coverage, long-context reasoning, coding benchmarks, and alignment with open-ended tasks. It is particularly strong on multilingual understanding, math reasoning (e.g., AIME, HMMT), and alignment evaluations like Arena-Hard and WritingBench.

    by qwen262K context$0.071/M input tokens$0.463/M output tokens
    Favicon for mistralai

    Mistral: Mistral Nemo

    90B tokens

    A 12B parameter model with a 128k token context length built by Mistral in collaboration with NVIDIA.

    The model is multilingual, supporting English, French, German, Spanish, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, and Hindi.

    It supports function calling and is released under the Apache 2.0 license.

    by mistralai131K context$0.02/M input tokens$0.04/M output tokens
    Favicon for deepseek

    DeepSeek: DeepSeek V3.1

    76.2B tokens

    DeepSeek-V3.1 is a large hybrid reasoning model (671B parameters, 37B active) that supports both thinking and non-thinking modes via prompt templates. It extends the DeepSeek-V3 base with a two-phase long-context training process, reaching up to 128K tokens, and uses FP8 microscaling for efficient inference. Users can control the reasoning behaviour with the reasoning enabled boolean. Learn more in our docs

    The model improves tool use, code generation, and reasoning efficiency, achieving performance comparable to DeepSeek-R1 on difficult benchmarks while responding more quickly. It supports structured tool calling, code agents, and search agents, making it suitable for research, coding, and agentic workflows.

    It succeeds the DeepSeek V3-0324 model and performs well on a variety of tasks.

    by deepseek33K context$0.15/M input tokens$0.75/M output tokens
    Favicon for meta-llama

    Meta: Llama 3.1 8B Instruct

    35.5B tokens

    Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This 8B instruct-tuned version is fast and efficient.

    It has demonstrated strong performance compared to leading closed-source models in human evaluations.

    To read more about the model release, click here. Usage of this model is subject to Meta's Acceptable Use Policy.

    by meta-llama131K context$0.02/M input tokens$0.03/M output tokens
    Favicon for qwen

    Qwen: Qwen3 VL 235B A22B Instruct

    27.6B tokens

    Qwen3-VL-235B-A22B Instruct is an open-weight multimodal model that unifies strong text generation with visual understanding across images and video. The Instruct model targets general vision-language use (VQA, document parsing, chart/table extraction, multilingual OCR). The series emphasizes robust perception (recognition of diverse real-world and synthetic categories), spatial understanding (2D/3D grounding), and long-form visual comprehension, with competitive results on public multimodal benchmarks for both perception and reasoning.

    Beyond analysis, Qwen3-VL supports agentic interaction and tool use: it can follow complex instructions over multi-image, multi-turn dialogues; align text to video timelines for precise temporal queries; and operate GUI elements for automation tasks. The models also enable visual coding workflows—turning sketches or mockups into code and assisting with UI debugging—while maintaining strong text-only performance comparable to the flagship Qwen3 language models. This makes Qwen3-VL suitable for production scenarios spanning document AI, multilingual OCR, software/UI assistance, spatial/embodied tasks, and research on vision-language agents.

    by qwen262K context$0.20/M input tokens$1.20/M output tokens
    Favicon for moonshotai

    MoonshotAI: Kimi K2 0905

    27.2B tokens

    Kimi K2 0905 is the September update of Kimi K2 0711. It is a large-scale Mixture-of-Experts (MoE) language model developed by Moonshot AI, featuring 1 trillion total parameters with 32 billion active per forward pass. It supports long-context inference up to 256k tokens, extended from the previous 128k.

    This update improves agentic coding with higher accuracy and better generalization across scaffolds, and enhances frontend coding with more aesthetic and functional outputs for web, 3D, and related tasks. Kimi K2 is optimized for agentic capabilities, including advanced tool use, reasoning, and code synthesis. It excels across coding (LiveCodeBench, SWE-bench), reasoning (ZebraLogic, GPQA), and tool-use (Tau2, AceBench) benchmarks. The model is trained with a novel stack incorporating the MuonClip optimizer for stable large-scale MoE training.

    by moonshotai262K context$0.39/M input tokens$1.90/M output tokens
    Favicon for meta-llama

    Meta: Llama 4 Maverick

    22.6B tokens

    Llama 4 Maverick 17B Instruct (128E) is a high-capacity multimodal language model from Meta, built on a mixture-of-experts (MoE) architecture with 128 experts and 17 billion active parameters per forward pass (400B total). It supports multilingual text and image input, and produces multilingual text and code output across 12 supported languages. Optimized for vision-language tasks, Maverick is instruction-tuned for assistant-like behavior, image reasoning, and general-purpose multimodal interaction.

    Maverick features early fusion for native multimodality and a 1 million token context window. It was trained on a curated mixture of public, licensed, and Meta-platform data, covering ~22 trillion tokens, with a knowledge cutoff in August 2024. Released on April 5, 2025 under the Llama 4 Community License, Maverick is suited for research and commercial applications requiring advanced multimodal understanding and high model throughput.

    by meta-llama1.05M context$0.15/M input tokens$0.60/M output tokens
    Favicon for deepseek

    DeepSeek: DeepSeek V3.2 Exp

    21.6B tokens

    DeepSeek-V3.2-Exp is an experimental large language model released by DeepSeek as an intermediate step between V3.1 and future architectures. It introduces DeepSeek Sparse Attention (DSA), a fine-grained sparse attention mechanism designed to improve training and inference efficiency in long-context scenarios while maintaining output quality. Users can control the reasoning behaviour with the reasoning enabled boolean. Learn more in our docs

    The model was trained under conditions aligned with V3.1-Terminus to enable direct comparison. Benchmarking shows performance roughly on par with V3.1 across reasoning, coding, and agentic tool-use tasks, with minor tradeoffs and gains depending on the domain. This release focuses on validating architectural optimizations for extended context lengths rather than advancing raw task accuracy, making it primarily a research-oriented model for exploring efficient transformer designs.

    by deepseek164K context$0.21/M input tokens$0.32/M output tokens
    Favicon for qwen

    Qwen: Qwen3 Embedding 8B

    18.5B tokens

    The Qwen3 Embedding model series is the latest proprietary model of the Qwen family, specifically designed for text embedding and ranking tasks. This series inherits the exceptional multilingual capabilities, long-text understanding, and reasoning skills of its foundational model. The Qwen3 Embedding series represents significant advancements in multiple text embedding and ranking tasks, including text retrieval, code retrieval, text classification, text clustering, and bitext mining.

    by qwen33K context$0.01/M input tokens$0/M output tokens
    Favicon for meta-llama

    Meta: Llama 3.3 70B Instruct

    17.7B tokens

    The Meta Llama 3.3 multilingual large language model (LLM) is a pretrained and instruction tuned generative model in 70B (text in/text out). The Llama 3.3 instruction tuned text only model is optimized for multilingual dialogue use cases and outperforms many of the available open source and closed chat models on common industry benchmarks.

    Supported languages: English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai.

    Model Card

    by meta-llama131K context$0.10/M input tokens$0.32/M output tokens
    Favicon for deepseek

    DeepSeek: DeepSeek V3.1 Terminus

    17.1B tokens

    DeepSeek-V3.1 Terminus is an update to DeepSeek V3.1 that maintains the model's original capabilities while addressing issues reported by users, including language consistency and agent capabilities, further optimizing the model's performance in coding and search agents. It is a large hybrid reasoning model (671B parameters, 37B active) that supports both thinking and non-thinking modes. It extends the DeepSeek-V3 base with a two-phase long-context training process, reaching up to 128K tokens, and uses FP8 microscaling for efficient inference. Users can control the reasoning behaviour with the reasoning enabled boolean. Learn more in our docs

    The model improves tool use, code generation, and reasoning efficiency, achieving performance comparable to DeepSeek-R1 on difficult benchmarks while responding more quickly. It supports structured tool calling, code agents, and search agents, making it suitable for research, coding, and agentic workflows.

    by deepseek164K context$0.21/M input tokens$0.79/M output tokens
    Favicon for mistralai

    Mistral: Mistral Small 3.2 24B

    17B tokens

    Mistral-Small-3.2-24B-Instruct-2506 is an updated 24B parameter model from Mistral optimized for instruction following, repetition reduction, and improved function calling. Compared to the 3.1 release, version 3.2 significantly improves accuracy on WildBench and Arena Hard, reduces infinite generations, and delivers gains in tool use and structured output tasks.

    It supports image and text inputs with structured outputs, function/tool calling, and strong performance across coding (HumanEval+, MBPP), STEM (MMLU, MATH, GPQA), and vision benchmarks (ChartQA, DocVQA).

    by mistralai131K context$0.06/M input tokens$0.18/M output tokens
    Favicon for deepseek

    DeepSeek: R1 0528

    13.7B tokens

    May 28th update to the original DeepSeek R1 Performance on par with OpenAI o1, but open-sourced and with fully open reasoning tokens. It's 671B parameters in size, with 37B active in an inference pass.

    Fully open-source model.

    by deepseek164K context$0.40/M input tokens$1.75/M output tokens
    Favicon for qwen

    Qwen: Qwen3 14B

    13.5B tokens

    Qwen3-14B is a dense 14.8B parameter causal language model from the Qwen3 series, designed for both complex reasoning and efficient dialogue. It supports seamless switching between a "thinking" mode for tasks like math, programming, and logical inference, and a "non-thinking" mode for general-purpose conversation. The model is fine-tuned for instruction-following, agent tool use, creative writing, and multilingual tasks across 100+ languages and dialects. It natively handles 32K token contexts and can extend to 131K tokens using YaRN-based scaling.

    by qwen41K context$0.05/M input tokens$0.22/M output tokens
    Favicon for qwen

    Qwen: Qwen3 Next 80B A3B Instruct

    12.9B tokens

    Qwen3-Next-80B-A3B-Instruct is an instruction-tuned chat model in the Qwen3-Next series optimized for fast, stable responses without “thinking” traces. It targets complex tasks across reasoning, code generation, knowledge QA, and multilingual use, while remaining robust on alignment and formatting. Compared with prior Qwen3 instruct variants, it focuses on higher throughput and stability on ultra-long inputs and multi-turn dialogues, making it well-suited for RAG, tool use, and agentic workflows that require consistent final answers rather than visible chain-of-thought.

    The model employs scaling-efficient training and decoding to improve parameter efficiency and inference speed, and has been validated on a broad set of public benchmarks where it reaches or approaches larger Qwen3 systems in several categories while outperforming earlier mid-sized baselines. It is best used as a general assistant, code helper, and long-context task solver in production settings where deterministic, instruction-following outputs are preferred.

    by qwen262K context$0.06/M input tokens$0.60/M output tokens
    Favicon for meta-llama

    Meta: Llama 3.1 70B Instruct

    12.1B tokens

    Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This 70B instruct-tuned version is optimized for high quality dialogue usecases.

    It has demonstrated strong performance compared to leading closed-source models in human evaluations.

    To read more about the model release, click here. Usage of this model is subject to Meta's Acceptable Use Policy.

    by meta-llama131K context$0.40/M input tokens$0.40/M output tokens
    Favicon for moonshotai

    MoonshotAI: Kimi K2 Thinking

    11.5B tokens

    Kimi K2 Thinking is Moonshot AI’s most advanced open reasoning model to date, extending the K2 series into agentic, long-horizon reasoning. Built on the trillion-parameter Mixture-of-Experts (MoE) architecture introduced in Kimi K2, it activates 32 billion parameters per forward pass and supports 256 k-token context windows. The model is optimized for persistent step-by-step thought, dynamic tool invocation, and complex reasoning workflows that span hundreds of turns. It interleaves step-by-step reasoning with tool use, enabling autonomous research, coding, and writing that can persist for hundreds of sequential actions without drift.

    It sets new open-source benchmarks on HLE, BrowseComp, SWE-Multilingual, and LiveCodeBench, while maintaining stable multi-agent behavior through 200–300 tool calls. Built on a large-scale MoE architecture with MuonClip optimization, it combines strong reasoning depth with high inference efficiency for demanding agentic and analytical tasks.

    by moonshotai262K context$0.40/M input tokens$1.75/M output tokens
    Favicon for qwen

    Qwen: Qwen3 Coder 480B A35B

    11.5B tokens

    Qwen3-Coder-480B-A35B-Instruct is a Mixture-of-Experts (MoE) code generation model developed by the Qwen team. It is optimized for agentic coding tasks such as function calling, tool use, and long-context reasoning over repositories. The model features 480 billion total parameters, with 35 billion active per forward pass (8 out of 160 experts).

    Pricing for the Alibaba endpoints varies by context length. Once a request is greater than 128k input tokens, the higher pricing is used.

    by qwen262K context$0.22/M input tokens$0.95/M output tokens
    Favicon for mistralai

    Mistral: Mistral Small 3

    10.8B tokens

    Mistral Small 3 is a 24B-parameter language model optimized for low-latency performance across common AI tasks. Released under the Apache 2.0 license, it features both pre-trained and instruction-tuned versions designed for efficient local deployment.

    The model achieves 81% accuracy on the MMLU benchmark and performs competitively with larger models like Llama 3.3 70B and Qwen 32B, while operating at three times the speed on equivalent hardware. Read the blog post about the model here.

    by mistralai33K context$0.03/M input tokens$0.11/M output tokens
    Favicon for nvidia

    NVIDIA: Nemotron 3 Nano 30B A3B (free)

    10.4B tokens

    NVIDIA Nemotron 3 Nano 30B A3B is a small language MoE model with highest compute efficiency and accuracy for developers to build specialized agentic AI systems.

    The model is fully open with open-weights, datasets and recipes so developers can easily customize, optimize, and deploy the model on their infrastructure for maximum privacy and security.

    Note: For the free endpoint, all prompts and output are logged to improve the provider's model and its product and services. Please do not upload any personal, confidential, or otherwise sensitive information. This is a trial use only. Do not use for production or business-critical systems.

    by nvidia256K context$0/M input tokens$0/M output tokens
    Favicon for qwen

    Qwen: Qwen3 32B

    9.84B tokens

    Qwen3-32B is a dense 32.8B parameter causal language model from the Qwen3 series, optimized for both complex reasoning and efficient dialogue. It supports seamless switching between a "thinking" mode for tasks like math, coding, and logical inference, and a "non-thinking" mode for faster, general-purpose conversation. The model demonstrates strong performance in instruction-following, agent tool use, creative writing, and multilingual tasks across 100+ languages and dialects. It natively handles 32K token contexts and can extend to 131K tokens using YaRN-based scaling.

    by qwen41K context$0.08/M input tokens$0.24/M output tokens
    Favicon for deepseek

    DeepSeek: DeepSeek V3.1 Terminus (exacto)

    6.3B tokens

    DeepSeek-V3.1 Terminus is an update to DeepSeek V3.1 that maintains the model's original capabilities while addressing issues reported by users, including language consistency and agent capabilities, further optimizing the model's performance in coding and search agents. It is a large hybrid reasoning model (671B parameters, 37B active) that supports both thinking and non-thinking modes. It extends the DeepSeek-V3 base with a two-phase long-context training process, reaching up to 128K tokens, and uses FP8 microscaling for efficient inference. Users can control the reasoning behaviour with the reasoning enabled boolean. Learn more in our docs

    The model improves tool use, code generation, and reasoning efficiency, achieving performance comparable to DeepSeek-R1 on difficult benchmarks while responding more quickly. It supports structured tool calling, code agents, and search agents, making it suitable for research, coding, and agentic workflows.

    by deepseek164K context$0.21/M input tokens$0.79/M output tokens