Search/
Skip to content
/
OpenRouterOpenRouter
© 2026 OpenRouter, Inc

Product

  • Chat
  • Rankings
  • Models
  • Providers
  • Pricing
  • Enterprise

Company

  • About
  • Announcements
  • CareersHiring
  • Privacy
  • Terms of Service
  • Support
  • State of AI
  • Works With OR

Developer

  • Documentation
  • API Reference
  • SDK
  • Status

Connect

  • Discord
  • GitHub
  • LinkedIn
  • X
  • YouTube
Collections/Roleplay

Best AI Models for Roleplay (RP) and Creative Writing

Model rankings updated February 2026 based on real usage data.

Discover the top AI models for roleplay (RP), character chat and creative writing, ranked by real usage data on OpenRouter. These LLMs excel at maintaining consistent personas, rich dialogue and immersive storytelling across long-context sessions.

Whether you're using Janitor AI, SillyTavern or another frontend, or building your own character chatbot or interactive fiction engine, OpenRouter gives you access to the best roleplay models through a single API.

LLM Leaderboard for Roleplay Models

1.
Deepseek V3.2
by deepseek
412B
30.2%
2.
Deepseek R1t2 Chimera (free)
by tngtech
74.8B
5.5%
3.
Gemini 2.5 Flash
by google
71.7B
5.2%
4.
Grok 4.1 Fast
by x-ai
69B
5.0%
5.
Gemini 3 Flash Preview
by google
53.6B
3.9%
6.
Deepseek Chat V3 0324
by deepseek
44B
3.2%
7.
gpt-oss-120b
by openai
43.6B
3.2%
8.
Gemini 2.5 Flash Lite
by google
41B
3.0%
9.
Deepseek Chat V3.1
by deepseek
37.6B
2.8%
10.
Others
by unknown
519B
38.0%

Top Roleplay Models on OpenRouter

Based on top weekly usage data from millions of users accessing AI models for roleplay through OpenRouter.

Favicon for google

Google: Gemini 3 Flash Preview

802B tokens

Gemini 3 Flash Preview is a high speed, high value thinking model designed for agentic workflows, multi turn chat, and coding assistance. It delivers near Pro level reasoning and tool use performance with substantially lower latency than larger Gemini variants, making it well suited for interactive development, long running agent loops, and collaborative coding tasks. Compared to Gemini 2.5 Flash, it provides broad quality improvements across reasoning, multimodal understanding, and reliability.

The model supports a 1M token context window and multimodal inputs including text, images, audio, video, and PDFs, with text output. It includes configurable reasoning via thinking levels (minimal, low, medium, high), structured output, tool use, and automatic context caching. Gemini 3 Flash Preview is optimized for users who want strong reasoning and agentic behavior without the cost or latency of full scale frontier models.

by google1.05M context$0.50/M input tokens$3/M output tokens$1/M audio tokens
Favicon for deepseek

DeepSeek: DeepSeek V3.2

631B tokens

DeepSeek-V3.2 is a large language model designed to harmonize high computational efficiency with strong reasoning and agentic tool-use performance. It introduces DeepSeek Sparse Attention (DSA), a fine-grained sparse attention mechanism that reduces training and inference cost while preserving quality in long-context scenarios. A scalable reinforcement learning post-training framework further improves reasoning, with reported performance in the GPT-5 class, and the model has demonstrated gold-medal results on the 2025 IMO and IOI. V3.2 also uses a large-scale agentic task synthesis pipeline to better integrate reasoning into tool-use settings, boosting compliance and generalization in interactive environments.

Users can control the reasoning behaviour with the reasoning enabled boolean. Learn more in our docs

by deepseek164K context$0.25/M input tokens$0.38/M output tokens
Favicon for google

Google: Gemini 2.5 Flash

417B tokens

Gemini 2.5 Flash is Google's state-of-the-art workhorse model, specifically designed for advanced reasoning, coding, mathematics, and scientific tasks. It includes built-in "thinking" capabilities, enabling it to provide responses with greater accuracy and nuanced context handling.

Additionally, Gemini 2.5 Flash is configurable through the "max tokens for reasoning" parameter, as described in the documentation (https://openrouter.ai/docs/use-cases/reasoning-tokens#max-tokens-for-reasoning).

by google1.05M context$0.30/M input tokens$2.50/M output tokens$1/M audio tokens
Favicon for google

Google: Gemini 2.5 Flash Lite

323B tokens

Gemini 2.5 Flash-Lite is a lightweight reasoning model in the Gemini 2.5 family, optimized for ultra-low latency and cost efficiency. It offers improved throughput, faster token generation, and better performance across common benchmarks compared to earlier Flash models. By default, "thinking" (i.e. multi-pass reasoning) is disabled to prioritize speed, but developers can enable it via the Reasoning API parameter to selectively trade off cost for intelligence.

by google1.05M context$0.10/M input tokens$0.40/M output tokens$0.30/M audio tokens
Favicon for x-ai

xAI: Grok 4.1 Fast

318B tokens

Grok 4.1 Fast is xAI's best agentic tool calling model that shines in real-world use cases like customer support and deep research. 2M context window.

Reasoning can be enabled/disabled using the reasoning enabled parameter in the API. Learn more in our docs

by x-ai2M context$0.20/M input tokens$0.50/M output tokens
Favicon for openai

OpenAI: gpt-oss-120b

263B tokens

gpt-oss-120b is an open-weight, 117B-parameter Mixture-of-Experts (MoE) language model from OpenAI designed for high-reasoning, agentic, and general-purpose production use cases. It activates 5.1B parameters per forward pass and is optimized to run on a single H100 GPU with native MXFP4 quantization. The model supports configurable reasoning depth, full chain-of-thought access, and native tool use, including function calling, browsing, and structured output generation.

by openai131K context$0.039/M input tokens$0.19/M output tokens
Favicon for tngtech

TNG: DeepSeek R1T2 Chimera (free)

105B tokens

DeepSeek-TNG-R1T2-Chimera is the second-generation Chimera model from TNG Tech. It is a 671 B-parameter mixture-of-experts text-generation model assembled from DeepSeek-AI’s R1-0528, R1, and V3-0324 checkpoints with an Assembly-of-Experts merge. The tri-parent design yields strong reasoning performance while running roughly 20 % faster than the original R1 and more than 2× faster than R1-0528 under vLLM, giving a favorable cost-to-intelligence trade-off. The checkpoint supports contexts up to 60 k tokens in standard use (tested to ~130 k) and maintains consistent <think> token behaviour, making it suitable for long-context analysis, dialogue and other open-ended generation tasks.

by tngtech164K context$0/M input tokens$0/M output tokens
Favicon for google

Google: Gemini 2.5 Pro

89.7B tokens

Gemini 2.5 Pro is Google’s state-of-the-art AI model designed for advanced reasoning, coding, mathematics, and scientific tasks. It employs “thinking” capabilities, enabling it to reason through responses with enhanced accuracy and nuanced context handling. Gemini 2.5 Pro achieves top-tier performance on multiple benchmarks, including first-place positioning on the LMArena leaderboard, reflecting superior human-preference alignment and complex problem-solving abilities.

by google1.05M context$1.25/M input tokens$10/M output tokens$1.25/M audio tokens
Favicon for deepseek

DeepSeek: DeepSeek V3 0324

86B tokens

DeepSeek V3, a 685B-parameter, mixture-of-experts model, is the latest iteration of the flagship chat model family from the DeepSeek team.

It succeeds the DeepSeek V3 model and performs really well on a variety of tasks.

by deepseek164K context$0.19/M input tokens$0.87/M output tokens
Favicon for deepseek

DeepSeek: DeepSeek V3.1

82.9B tokens

DeepSeek-V3.1 is a large hybrid reasoning model (671B parameters, 37B active) that supports both thinking and non-thinking modes via prompt templates. It extends the DeepSeek-V3 base with a two-phase long-context training process, reaching up to 128K tokens, and uses FP8 microscaling for efficient inference. Users can control the reasoning behaviour with the reasoning enabled boolean. Learn more in our docs

The model improves tool use, code generation, and reasoning efficiency, achieving performance comparable to DeepSeek-R1 on difficult benchmarks while responding more quickly. It supports structured tool calling, code agents, and search agents, making it suitable for research, coding, and agentic workflows.

It succeeds the DeepSeek V3-0324 model and performs well on a variety of tasks.

by deepseek33K context$0.15/M input tokens$0.75/M output tokens