Search/
Skip to content
/
OpenRouter
© 2026 OpenRouter, Inc

Product

  • Chat
  • Rankings
  • Apps
  • Models
  • Providers
  • Pricing
  • Enterprise
  • Labs

Company

  • About
  • Announcements
  • CareersHiring
  • Privacy
  • Terms of Service
  • Support
  • State of AI
  • Works With OR
  • Data

Developer

  • Documentation
  • API Reference
  • SDK
  • Status

Connect

  • Discord
  • GitHub
  • LinkedIn
  • X
  • YouTube
Collections/Roleplay

Best AI Models for Roleplay (RP) and Creative Writing

Model rankings updated April 2026 based on real usage data.

Discover the top AI models for roleplay (RP), character chat and creative writing, ranked by real usage data on OpenRouter. These LLMs excel at maintaining consistent personas, rich dialogue and immersive storytelling across long-context sessions.

Whether you're using Janitor AI, SillyTavern or another frontend, or building your own character chatbot or interactive fiction engine, OpenRouter gives you access to the best roleplay models through a single API.

LLM Leaderboard for Roleplay Models

1.
Deepseek V3.2
by deepseek
1.12T
39.4%
2.
Grok 4.1 Fast
by x-ai
177B
6.2%
3.
gpt-oss-120b
by openai
166B
5.8%
4.
GLM 4.5 Air
by z-ai
159B
5.6%
5.
Gemini 2.5 Flash Lite
by google
136B
4.8%
6.
Gemini 3 Flash Preview
by google
105B
3.7%
7.
GLM 5
by z-ai
56.7B
2.0%
8.
Qwen3 235B A22B
by qwen
56.5B
2.0%
9.
Gemini 2.5 Pro
by google
53.4B
1.9%
10.
Others
by unknown
813B
28.6%

Top Roleplay Models on OpenRouter

Based on top weekly usage data from millions of users accessing AI models for roleplay through OpenRouter.

Favicon for deepseek

DeepSeek: DeepSeek V3.2

1.31T tokens

DeepSeek-V3.2 is a large language model designed to harmonize high computational efficiency with strong reasoning and agentic tool-use performance. It introduces DeepSeek Sparse Attention (DSA), a fine-grained sparse attention mechanism that reduces training and inference cost while preserving quality in long-context scenarios. A scalable reinforcement learning post-training framework further improves reasoning, with reported performance in the GPT-5 class, and the model has demonstrated gold-medal results on the 2025 IMO and IOI. V3.2 also uses a large-scale agentic task synthesis pipeline to better integrate reasoning into tool-use settings, boosting compliance and generalization in interactive environments.

Users can control the reasoning behaviour with the reasoning enabled boolean. Learn more in our docs

by deepseek164K context$0.26/M input tokens$0.38/M output tokens
Favicon for anthropic

Anthropic: Claude Opus 4.6

1.21T tokens

Opus 4.6 is Anthropic’s strongest model for coding and long-running professional tasks. It is built for agents that operate across entire workflows rather than single prompts, making it especially effective for large codebases, complex refactors, and multi-step debugging that unfolds over time. The model shows deeper contextual understanding, stronger problem decomposition, and greater reliability on hard engineering tasks than prior generations.

Beyond coding, Opus 4.6 excels at sustained knowledge work. It produces near-production-ready documents, plans, and analyses in a single pass, and maintains coherence across very long outputs and extended sessions. This makes it a strong default for tasks that require persistence, judgment, and follow-through, such as technical design, migration planning, and end-to-end project execution.

For users upgrading from earlier Opus versions, see our official migration guide here

by anthropic1M context$5/M input tokens$25/M output tokens
Favicon for minimax

MiniMax: MiniMax M2.7

1.21T tokens

MiniMax-M2.7 is a next-generation large language model designed for autonomous, real-world productivity and continuous improvement. Built to actively participate in its own evolution, M2.7 integrates advanced agentic capabilities through multi-agent collaboration, enabling it to plan, execute, and refine complex tasks across dynamic environments.

Trained for production-grade performance, M2.7 handles workflows such as live debugging, root cause analysis, financial modeling, and full document generation across Word, Excel, and PowerPoint. It delivers strong results on benchmarks including 56.2% on SWE-Pro and 57.0% on Terminal Bench 2, while achieving a 1495 ELO on GDPval-AA, setting a new standard for multi-agent systems operating in real-world digital workflows.

by minimax205K context$0.30/M input tokens$1.20/M output tokens
Favicon for anthropic

Anthropic: Claude Sonnet 4.6

1.19T tokens

Sonnet 4.6 is Anthropic's most capable Sonnet-class model yet, with frontier performance across coding, agents, and professional work. It excels at iterative development, complex codebase navigation, end-to-end project management with memory, polished document creation, and confident computer use for web QA and workflow automation.

by anthropic1M context$3/M input tokens$15/M output tokens
Favicon for google

Google: Gemini 3 Flash Preview

1.09T tokens

Gemini 3 Flash Preview is a high speed, high value thinking model designed for agentic workflows, multi turn chat, and coding assistance. It delivers near Pro level reasoning and tool use performance with substantially lower latency than larger Gemini variants, making it well suited for interactive development, long running agent loops, and collaborative coding tasks. Compared to Gemini 2.5 Flash, it provides broad quality improvements across reasoning, multimodal understanding, and reliability.

The model supports a 1M token context window and multimodal inputs including text, images, audio, video, and PDFs, with text output. It includes configurable reasoning via thinking levels (minimal, low, medium, high), structured output, tool use, and automatic context caching. Gemini 3 Flash Preview is optimized for users who want strong reasoning and agentic behavior without the cost or latency of full scale frontier models.

by google1.05M context$0.50/M input tokens$3/M output tokens$1/M audio tokens
Favicon for nvidia

NVIDIA: Nemotron 3 Super (free)

657B tokens

NVIDIA Nemotron 3 Super is a 120B-parameter open hybrid MoE model, activating just 12B parameters for maximum compute efficiency and accuracy in complex multi-agent applications. Built on a hybrid Mamba-Transformer Mixture-of-Experts architecture with multi-token prediction (MTP), it delivers over 50% higher token generation compared to leading open models.

The model features a 1M token context window for long-term agent coherence, cross-document reasoning, and multi-step task planning. Latent MoE enables calling 4 experts for the inference cost of only one, improving intelligence and generalization. Multi-environment RL training across 10+ environments delivers leading accuracy on benchmarks including AIME 2025, TerminalBench, and SWE-Bench Verified.

Fully open with weights, datasets, and recipes under the NVIDIA Open License, Nemotron 3 Super allows easy customization and secure deployment anywhere — from workstation to cloud.

by nvidia262K context$0/M input tokens$0/M output tokens
Favicon for openai

OpenAI: GPT-4o-mini

625B tokens

GPT-4o mini is OpenAI's newest model after GPT-4 Omni, supporting both text and image inputs with text outputs.

As their most advanced small model, it is many multiples more affordable than other recent frontier models, and more than 60% cheaper than GPT-3.5 Turbo. It maintains SOTA intelligence, while being significantly more cost-effective.

GPT-4o mini achieves an 82% score on MMLU and presently ranks higher than GPT-4 on chat preferences common leaderboards.

Check out the launch announcement to learn more.

#multimodal

by openai128K context$0.15/M input tokens$0.60/M output tokens
Favicon for moonshotai

MoonshotAI: Kimi K2.5

608B tokens

Kimi K2.5 is Moonshot AI's native multimodal model, delivering state-of-the-art visual coding capability and a self-directed agent swarm paradigm. Built on Kimi K2 with continued pretraining over approximately 15T mixed visual and text tokens, it delivers strong performance in general reasoning, visual coding, and agentic tool-calling.

by moonshotai262K context$0.3827/M input tokens$1.72/M output tokens
Favicon for google

Google: Gemini 2.5 Flash Lite

566B tokens

Gemini 2.5 Flash-Lite is a lightweight reasoning model in the Gemini 2.5 family, optimized for ultra-low latency and cost efficiency. It offers improved throughput, faster token generation, and better performance across common benchmarks compared to earlier Flash models. By default, "thinking" (i.e. multi-pass reasoning) is disabled to prioritize speed, but developers can enable it via the Reasoning API parameter to selectively trade off cost for intelligence.

by google1.05M context$0.10/M input tokens$0.40/M output tokens$0.30/M audio tokens
Favicon for xiaomi

Xiaomi: MiMo-V2-Pro

558B tokens

MiMo-V2-Pro is Xiaomi's flagship foundation model, featuring over 1T total parameters and a 1M context length, deeply optimized for agentic scenarios. It is highly adaptable to general agent frameworks like OpenClaw. It ranks among the global top tier in the standard PinchBench and ClawBench benchmarks, with perceived performance approaching that of Opus 4.6. MiMo-V2-Pro is designed to serve as the brain of agent systems, orchestrating complex workflows, driving production engineering tasks, and delivering results reliably.

by xiaomi1.05M context$1/M input tokens$3/M output tokens