Search/
Skip to content
/
OpenRouterOpenRouter
© 2026 OpenRouter, Inc

Product

  • Chat
  • Rankings
  • Models
  • Providers
  • Pricing
  • Enterprise

Company

  • About
  • Announcements
  • CareersHiring
  • Privacy
  • Terms of Service
  • Support
  • State of AI
  • Works With OR

Developer

  • Documentation
  • API Reference
  • SDK
  • Status

Connect

  • Discord
  • GitHub
  • LinkedIn
  • X
  • YouTube
Collections/Free Models

Free AI Models on OpenRouter

Model rankings updated February 2026 based on real usage data.

At OpenRouter, we believe that free models play a crucial role in democratizing access to AI. These models allow hundreds of thousands of users worldwide to experiment, learn, and innovate. Below you will find the top free AI models currently available on OpenRouter.

We are continuing to actively expand our free model capacity by onboarding new providers and directly covering costs to help promote freely accessible models. While we can't guarantee what the future holds, we will continue to support free inference options on our platform.

For the simplest way to get started, try openrouter/free, a router that automatically selects from available free models based on your request's requirements.

Top Free Models on OpenRouter

Favicon for arcee-ai

Arcee AI: Trinity Large Preview (free)

393B tokens

Trinity-Large-Preview is a frontier-scale open-weight language model from Arcee, built as a 400B-parameter sparse Mixture-of-Experts with 13B active parameters per token using 4-of-256 expert routing.

It excels in creative writing, storytelling, role-play, chat scenarios, and real-time voice assistance, better than your average reasoning model usually can. But we’re also introducing some of our newer agentic performance. It was trained to navigate well in agent harnesses like OpenCode, Cline, and Kilo Code, and to handle complex toolchains and long, constraint-filled prompts.

The architecture natively supports very long context windows up to 512k tokens, with the Preview API currently served at 128k context using 8-bit quantization for practical deployment. Trinity-Large-Preview reflects Arcee’s efficiency-first design philosophy, offering a production-oriented frontier model with open weights and permissive licensing suitable for real-world applications and experimentation.

by arcee-ai131K context$0/M input tokens$0/M output tokens
Favicon for openrouter

Pony Alpha

153B tokens

Pony is a cutting-edge foundation model with strong performance in coding, agentic workflows, reasoning, and roleplay, making it well suited for hands-on coding and real-world use.

Note: All prompts and completions for this model are logged by the provider and may be used to improve the model.

by openrouter200K context$0/M input tokens$0/M output tokens
Favicon for stepfun

StepFun: Step 3.5 Flash (free)

110B tokens

Step 3.5 Flash is StepFun's most capable open-source foundation model. Built on a sparse Mixture of Experts (MoE) architecture, it selectively activates only 11B of its 196B parameters per token. It is a reasoning model that is incredibly speed efficient even at long contexts.

by stepfun256K context$0/M input tokens$0/M output tokens
Favicon for tngtech

TNG: DeepSeek R1T2 Chimera (free)

91.5B tokens

DeepSeek-TNG-R1T2-Chimera is the second-generation Chimera model from TNG Tech. It is a 671 B-parameter mixture-of-experts text-generation model assembled from DeepSeek-AI’s R1-0528, R1, and V3-0324 checkpoints with an Assembly-of-Experts merge. The tri-parent design yields strong reasoning performance while running roughly 20 % faster than the original R1 and more than 2× faster than R1-0528 under vLLM, giving a favorable cost-to-intelligence trade-off. The checkpoint supports contexts up to 60 k tokens in standard use (tested to ~130 k) and maintains consistent <think> token behaviour, making it suitable for long-context analysis, dialogue and other open-ended generation tasks.

by tngtech164K context$0/M input tokens$0/M output tokens
Favicon for z-ai

Z.AI: GLM 4.5 Air (free)

50.9B tokens

GLM-4.5-Air is the lightweight variant of our latest flagship model family, also purpose-built for agent-centric applications. Like GLM-4.5, it adopts the Mixture-of-Experts (MoE) architecture but with a more compact parameter size. GLM-4.5-Air also supports hybrid inference modes, offering a "thinking mode" for advanced reasoning and tool use, and a "non-thinking mode" for real-time interaction. Users can control the reasoning behaviour with the reasoning enabled boolean. Learn more in our docs

by z-ai131K context$0/M input tokens$0/M output tokens
Favicon for tngtech

TNG: DeepSeek R1T Chimera (free)

23.2B tokens

DeepSeek-R1T-Chimera is created by merging DeepSeek-R1 and DeepSeek-V3 (0324), combining the reasoning capabilities of R1 with the token efficiency improvements of V3. It is based on a DeepSeek-MoE Transformer architecture and is optimized for general text generation tasks.

The model merges pretrained weights from both source models to balance performance across reasoning, efficiency, and instruction-following tasks. It is released under the MIT license and intended for research and commercial use.

by tngtech164K context$0/M input tokens$0/M output tokens
Favicon for nvidia

NVIDIA: Nemotron 3 Nano 30B A3B (free)

15B tokens

NVIDIA Nemotron 3 Nano 30B A3B is a small language MoE model with highest compute efficiency and accuracy for developers to build specialized agentic AI systems.

The model is fully open with open-weights, datasets and recipes so developers can easily customize, optimize, and deploy the model on their infrastructure for maximum privacy and security.

Note: For the free endpoint, all prompts and output are logged to improve the provider's model and its product and services. Please do not upload any personal, confidential, or otherwise sensitive information. This is a trial use only. Do not use for production or business-critical systems.

by nvidia256K context$0/M input tokens$0/M output tokens
Favicon for deepseek

DeepSeek: R1 0528 (free)

13B tokens

May 28th update to the original DeepSeek R1 Performance on par with OpenAI o1, but open-sourced and with fully open reasoning tokens. It's 671B parameters in size, with 37B active in an inference pass.

Fully open-source model.

by deepseek164K context$0/M input tokens$0/M output tokens
Favicon for tngtech

TNG: R1T Chimera (free)

9.06B tokens

TNG-R1T-Chimera is an experimental LLM with a faible for creative storytelling and character interaction. It is a derivate of the original TNG/DeepSeek-R1T-Chimera released in April 2025 and is available exclusively via Chutes and OpenRouter.

Characteristics and improvements include:

We think that it has a creative and pleasant personality. It has a preliminary EQ-Bench3 value of about 1305. It is quite a bit more intelligent than the original, albeit a slightly slower. It is much more think-token consistent, i.e. reasoning and answer blocks are properly delineated. Tool calling is much improved.

TNG Tech, the model authors, ask that users follow the careful guidelines that Microsoft has created for their "MAI-DS-R1" DeepSeek-based model. These guidelines are available on Hugging Face (https://huggingface.co/microsoft/MAI-DS-R1).

by tngtech164K context$0/M input tokens$0/M output tokens
Favicon for openai

OpenAI: gpt-oss-120b (free)

4.12B tokens

gpt-oss-120b is an open-weight, 117B-parameter Mixture-of-Experts (MoE) language model from OpenAI designed for high-reasoning, agentic, and general-purpose production use cases. It activates 5.1B parameters per forward pass and is optimized to run on a single H100 GPU with native MXFP4 quantization. The model supports configurable reasoning depth, full chain-of-thought access, and native tool use, including function calling, browsing, and structured output generation.

by openai131K context$0/M input tokens$0/M output tokens
Favicon for openrouter

Aurora Alpha

3.58B tokens

This is a cloaked model provided to the community to gather feedback. A reasoning model designed for speed. It is built for coding assistants, real-time conversational applications, and agentic workflows.

Default reasoning effort is set to medium for fast responses. For agentic coding use cases, we recommend changing effort to high.

Note: All prompts and completions for this model are logged by the provider and may be used to improve the model.

by openrouter128K context$0/M input tokens$0/M output tokens
Favicon for qwen

Qwen: Qwen3 Coder 480B A35B (free)

2.44B tokens

Qwen3-Coder-480B-A35B-Instruct is a Mixture-of-Experts (MoE) code generation model developed by the Qwen team. It is optimized for agentic coding tasks such as function calling, tool use, and long-context reasoning over repositories. The model features 480 billion total parameters, with 35 billion active per forward pass (8 out of 160 experts).

Pricing for the Alibaba endpoints varies by context length. Once a request is greater than 128k input tokens, the higher pricing is used.

by qwen262K context$0/M input tokens$0/M output tokens
Favicon for upstage

Upstage: Solar Pro 3 (free)

2.18B tokens

Solar Pro 3 is Upstage's powerful Mixture-of-Experts (MoE) language model. With 102B total parameters and 12B active parameters per forward pass, it delivers exceptional performance while maintaining computational efficiency. Optimized for Korean with English and Japanese support.

by upstage128K context$0/M input tokens$0/M output tokens
Favicon for meta-llama

Meta: Llama 3.3 70B Instruct (free)

2.14B tokens

The Meta Llama 3.3 multilingual large language model (LLM) is a pretrained and instruction tuned generative model in 70B (text in/text out). The Llama 3.3 instruction tuned text only model is optimized for multilingual dialogue use cases and outperforms many of the available open source and closed chat models on common industry benchmarks.

Supported languages: English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai.

Model Card

by meta-llama128K context$0/M input tokens$0/M output tokens
Favicon for arcee-ai

Arcee AI: Trinity Mini (free)

2.04B tokens

Trinity Mini is a 26B-parameter (3B active) sparse mixture-of-experts language model featuring 128 experts with 8 active per token. Engineered for efficient reasoning over long contexts (131k) with robust function calling and multi-step agent workflows.

by arcee-ai131K context$0/M input tokens$0/M output tokens
Favicon for nvidia

NVIDIA: Nemotron Nano 12B 2 VL (free)

1.48B tokens

NVIDIA Nemotron Nano 2 VL is a 12-billion-parameter open multimodal reasoning model designed for video understanding and document intelligence. It introduces a hybrid Transformer-Mamba architecture, combining transformer-level accuracy with Mamba’s memory-efficient sequence modeling for significantly higher throughput and lower latency.

The model supports inputs of text and multi-image documents, producing natural-language outputs. It is trained on high-quality NVIDIA-curated synthetic datasets optimized for optical-character recognition, chart reasoning, and multimodal comprehension.

Nemotron Nano 2 VL achieves leading results on OCRBench v2 and scores ≈ 74 average across MMMU, MathVista, AI2D, OCRBench, OCR-Reasoning, ChartQA, DocVQA, and Video-MME—surpassing prior open VL baselines. With Efficient Video Sampling (EVS), it handles long-form videos while reducing inference cost.

Open-weights, training data, and fine-tuning recipes are released under a permissive NVIDIA open license, with deployment supported across NeMo, NIM, and major inference runtimes.

by nvidia128K context$0/M input tokens$0/M output tokens