Skip to content
  • Status
  • Announcements
  • Docs
  • Support
  • About
  • Partners
  • Enterprise
  • Careers
  • Pricing
  • Privacy
  • Terms
  •  
  • © 2025 OpenRouter, Inc

    Free AI Models on OpenRouter

    At OpenRouter, we believe that free models play a crucial role in democratizing access to AI. These models allow hundreds of thousands of users worldwide to experiment, learn, and innovate. Below you will find the top free AI models currently available on OpenRouter.

    We are continuing to actively expand our free model capacity by onboarding new providers and directly covering costs to help promote freely accessible models. While we can't guarantee what the future holds, we will continue to support free inference options on our platform.

    Top Free Models on OpenRouter

    Favicon for xiaomi

    Xiaomi: MiMo-V2-Flash (free)

    403B tokens

    MiMo-V2-Flash is an open-source foundation language model developed by Xiaomi. It is a Mixture-of-Experts model with 309B total parameters and 15B active parameters, adopting hybrid attention architecture. MiMo-V2-Flash supports a hybrid-thinking toggle and a 256K context window, and excels at reasoning, coding, and agent scenarios. On SWE-bench Verified and SWE-bench Multilingual, MiMo-V2-Flash ranks as the top #1 open-source model globally, delivering performance comparable to Claude Sonnet 4.5 while costing only about 3.5% as much.

    Note: when integrating with agentic tools such as Claude Code, Cline, or Roo Code, turn off reasoning mode for the best and fastest performance—this model is deeply optimized for this scenario.

    Users can control the reasoning behaviour with the reasoning enabled boolean. Learn more in our docs.

    by xiaomi262K context$0/M input tokens$0/M output tokens
    Favicon for mistralai

    Mistral: Devstral 2 2512 (free)

    121B tokens

    Devstral 2 is a state-of-the-art open-source model by Mistral AI specializing in agentic coding. It is a 123B-parameter dense transformer model supporting a 256K context window.

    Devstral 2 supports exploring codebases and orchestrating changes across multiple files while maintaining architecture-level context. It tracks framework dependencies, detects failures, and retries with corrections—solving challenges like bug fixing and modernizing legacy systems. The model can be fine-tuned to prioritize specific languages or optimize for large enterprise codebases. It is available under a modified MIT license.

    by mistralai262K context$0/M input tokens$0/M output tokens
    Favicon for kwaipilot

    Kwaipilot: KAT-Coder-Pro V1 (free)

    112B tokens

    KAT-Coder-Pro V1 is KwaiKAT's most advanced agentic coding model in the KAT-Coder series. Designed specifically for agentic coding tasks, it excels in real-world software engineering scenarios, achieving 73.4% solve rate on the SWE-Bench Verified benchmark.

    The model has been optimized for tool-use capability, multi-turn interaction, instruction following, generalization, and comprehensive capabilities through a multi-stage training process, including mid-training, supervised fine-tuning (SFT), reinforcement fine-tuning (RFT), and scalable agentic RL.

    by kwaipilot256K context$0/M input tokens$0/M output tokens
    Favicon for tngtech

    TNG: DeepSeek R1T2 Chimera (free)

    89.9B tokens

    DeepSeek-TNG-R1T2-Chimera is the second-generation Chimera model from TNG Tech. It is a 671 B-parameter mixture-of-experts text-generation model assembled from DeepSeek-AI’s R1-0528, R1, and V3-0324 checkpoints with an Assembly-of-Experts merge. The tri-parent design yields strong reasoning performance while running roughly 20 % faster than the original R1 and more than 2× faster than R1-0528 under vLLM, giving a favorable cost-to-intelligence trade-off. The checkpoint supports contexts up to 60 k tokens in standard use (tested to ~130 k) and maintains consistent <think> token behaviour, making it suitable for long-context analysis, dialogue and other open-ended generation tasks.

    by tngtech164K context$0/M input tokens$0/M output tokens
    Favicon for nex-agi

    Nex AGI: DeepSeek V3.1 Nex N1 (free)

    63B tokens

    DeepSeek V3.1 Nex-N1 is the flagship release of the Nex-N1 series — a post-trained model designed to highlight agent autonomy, tool use, and real-world productivity.

    Nex-N1 demonstrates competitive performance across all evaluation scenarios, showing particularly strong results in practical coding and HTML generation tasks.

    by nex-agi131K context$0/M input tokens$0/M output tokens
    Favicon for tngtech

    TNG: DeepSeek R1T Chimera (free)

    20.8B tokens

    DeepSeek-R1T-Chimera is created by merging DeepSeek-R1 and DeepSeek-V3 (0324), combining the reasoning capabilities of R1 with the token efficiency improvements of V3. It is based on a DeepSeek-MoE Transformer architecture and is optimized for general text generation tasks.

    The model merges pretrained weights from both source models to balance performance across reasoning, efficiency, and instruction-following tasks. It is released under the MIT license and intended for research and commercial use.

    by tngtech164K context$0/M input tokens$0/M output tokens
    Favicon for nvidia

    NVIDIA: Nemotron 3 Nano 30B A3B (free)

    10.6B tokens

    NVIDIA Nemotron 3 Nano 30B A3B is a small language MoE model with highest compute efficiency and accuracy for developers to build specialized agentic AI systems.

    The model is fully open with open-weights, datasets and recipes so developers can easily customize, optimize, and deploy the model on their infrastructure for maximum privacy and security.

    Note: For the free endpoint, all prompts and output are logged to improve the provider's model and its product and services. Please do not upload any personal, confidential, or otherwise sensitive information. This is a trial use only. Do not use for production or business-critical systems.

    by nvidia256K context$0/M input tokens$0/M output tokens
    Favicon for z-ai

    Z.AI: GLM 4.5 Air (free)

    7.84B tokens

    GLM-4.5-Air is the lightweight variant of our latest flagship model family, also purpose-built for agent-centric applications. Like GLM-4.5, it adopts the Mixture-of-Experts (MoE) architecture but with a more compact parameter size. GLM-4.5-Air also supports hybrid inference modes, offering a "thinking mode" for advanced reasoning and tool use, and a "non-thinking mode" for real-time interaction. Users can control the reasoning behaviour with the reasoning enabled boolean. Learn more in our docs

    by z-ai131K context$0/M input tokens$0/M output tokens
    Favicon for nvidia

    NVIDIA: Nemotron Nano 12B 2 VL (free)

    6.07B tokens

    NVIDIA Nemotron Nano 2 VL is a 12-billion-parameter open multimodal reasoning model designed for video understanding and document intelligence. It introduces a hybrid Transformer-Mamba architecture, combining transformer-level accuracy with Mamba’s memory-efficient sequence modeling for significantly higher throughput and lower latency.

    The model supports inputs of text and multi-image documents, producing natural-language outputs. It is trained on high-quality NVIDIA-curated synthetic datasets optimized for optical-character recognition, chart reasoning, and multimodal comprehension.

    Nemotron Nano 2 VL achieves leading results on OCRBench v2 and scores ≈ 74 average across MMMU, MathVista, AI2D, OCRBench, OCR-Reasoning, ChartQA, DocVQA, and Video-MME—surpassing prior open VL baselines. With Efficient Video Sampling (EVS), it handles long-form videos while reducing inference cost.

    Open-weights, training data, and fine-tuning recipes are released under a permissive NVIDIA open license, with deployment supported across NeMo, NIM, and major inference runtimes.

    by nvidia128K context$0/M input tokens$0/M output tokens
    Favicon for tngtech

    TNG: R1T Chimera (free)

    5.24B tokens

    TNG-R1T-Chimera is an experimental LLM with a faible for creative storytelling and character interaction. It is a derivate of the original TNG/DeepSeek-R1T-Chimera released in April 2025 and is available exclusively via Chutes and OpenRouter.

    Characteristics and improvements include:

    We think that it has a creative and pleasant personality. It has a preliminary EQ-Bench3 value of about 1305. It is quite a bit more intelligent than the original, albeit a slightly slower. It is much more think-token consistent, i.e. reasoning and answer blocks are properly delineated. Tool calling is much improved.

    TNG Tech, the model authors, ask that users follow the careful guidelines that Microsoft has created for their "MAI-DS-R1" DeepSeek-based model. These guidelines are available on Hugging Face (https://huggingface.co/microsoft/MAI-DS-R1).

    by tngtech164K context$0/M input tokens$0/M output tokens
    Favicon for qwen

    Qwen: Qwen3 Coder 480B A35B (free)

    4.65B tokens

    Qwen3-Coder-480B-A35B-Instruct is a Mixture-of-Experts (MoE) code generation model developed by the Qwen team. It is optimized for agentic coding tasks such as function calling, tool use, and long-context reasoning over repositories. The model features 480 billion total parameters, with 35 billion active per forward pass (8 out of 160 experts).

    Pricing for the Alibaba endpoints varies by context length. Once a request is greater than 128k input tokens, the higher pricing is used.

    by qwen262K context$0/M input tokens$0/M output tokens
    Favicon for deepseek

    DeepSeek: R1 0528 (free)

    3.4B tokens

    May 28th update to the original DeepSeek R1 Performance on par with OpenAI o1, but open-sourced and with fully open reasoning tokens. It's 671B parameters in size, with 37B active in an inference pass.

    Fully open-source model.

    by deepseek164K context$0/M input tokens$0/M output tokens
    Favicon for allenai

    AllenAI: Olmo 3.1 32B Think (free)

    2.75B tokens

    Olmo 3.1 32B Think is a large-scale, 32-billion-parameter model designed for deep reasoning, complex multi-step logic, and advanced instruction following. Building on the Olmo 3 series, version 3.1 delivers refined reasoning behavior and stronger performance across demanding evaluations and nuanced conversational tasks. Developed by Ai2 under the Apache 2.0 license, Olmo 3.1 32B Think continues the Olmo initiative’s commitment to openness, providing full transparency across model weights, code, and training methodology.

    by allenai66K context$0/M input tokens$0/M output tokens
    Favicon for google

    Google: Gemma 3 27B (free)

    2.72B tokens

    Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities, including structured outputs and function calling. Gemma 3 27B is Google's latest open source model, successor to Gemma 2

    by google131K context$0/M input tokens$0/M output tokens
    Favicon for meta-llama

    Meta: Llama 3.3 70B Instruct (free)

    2.27B tokens

    The Meta Llama 3.3 multilingual large language model (LLM) is a pretrained and instruction tuned generative model in 70B (text in/text out). The Llama 3.3 instruction tuned text only model is optimized for multilingual dialogue use cases and outperforms many of the available open source and closed chat models on common industry benchmarks.

    Supported languages: English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai.

    Model Card

    by meta-llama131K context$0/M input tokens$0/M output tokens
    Favicon for openai

    OpenAI: gpt-oss-120b (free)

    1.91B tokens

    gpt-oss-120b is an open-weight, 117B-parameter Mixture-of-Experts (MoE) language model from OpenAI designed for high-reasoning, agentic, and general-purpose production use cases. It activates 5.1B parameters per forward pass and is optimized to run on a single H100 GPU with native MXFP4 quantization. The model supports configurable reasoning depth, full chain-of-thought access, and native tool use, including function calling, browsing, and structured output generation.

    by openai131K context$0/M input tokens$0/M output tokens