Skip to content
No models found
OpenRouter
© 2026 OpenRouter, Inc

Product

  • Chat
  • Rankings
  • Apps
  • Models
  • Providers
  • Pricing
  • Enterprise
  • Labs

Company

  • About
  • Blog
  • CareersHiring
  • Privacy
  • Terms of Service
  • Support
  • State of AI
  • Works With OR
  • Data

Developer

  • Documentation
  • API Reference
  • SDK
  • Status

Connect

  • Discord
  • GitHub
  • LinkedIn
  • X
  • YouTube
Collections/Roleplay

Best AI Models for Roleplay (RP) and Creative Writing

Model rankings updated June 2026 based on real usage data.

Discover the top AI models for roleplay (RP), character chat and creative writing, ranked by real usage data on OpenRouter. These LLMs excel at maintaining consistent personas, rich dialogue and immersive storytelling across long-context sessions.

Whether you're using Janitor AI, SillyTavern or another frontend, or building your own character chatbot or interactive fiction engine, OpenRouter gives you access to the best roleplay models through a single API.

LLM Leaderboard for Roleplay Models

1.
Deepseek V3.2
by deepseek
1.3T
28.4%
2.
Deepseek V4 Flash
by deepseek
457B
10.0%
3.
Deepseek V4 Pro
by deepseek
437B
9.5%
4.
GLM 4.5 Air
by z-ai
288B
6.3%
5.
Gemini 3 Flash Preview
by google
224B
4.9%
6.
gpt-oss-120b
by openai
152B
3.3%
7.
Gemini 2.5 Flash Lite
by google
149B
3.2%
8.
Gemma 4 31B IT
by google
149B
3.2%
9.
Gemma 4 26B A4B IT
by google
92.5B
2.0%
10.
Others
1.33T
29.1%

Top Roleplay Models on OpenRouter

Based on top weekly usage data from millions of users accessing AI models for roleplay through OpenRouter.

Favicon for deepseek

DeepSeek: DeepSeek V4 Flash

3.85T tokens

DeepSeek V4 Flash is an efficiency-optimized Mixture-of-Experts model from DeepSeek with 284B total parameters and 13B activated parameters, supporting a 1M-token context window. It is designed for fast inference and high-throughput workloads, while maintaining strong reasoning and coding performance.

The model includes hybrid attention for efficient long-context processing. Reasoning efforts high and xhigh are supported; xhigh maps to max reasoning. It is well suited for applications such as coding assistants, chat systems, and agent workflows where responsiveness and cost efficiency are important.

by deepseek1.05M context$0.0983/M input tokens$0.1966/M output tokens
Favicon for minimax

MiniMax: MiniMax M3

2.59T tokens

MiniMax-M3 is a multimodal foundation model from MiniMax. It supports text, image, and video inputs with text output, a 1M-token context window, and is suited for long-horizon agentic work, coding, and tool use. It is built on MiniMax Sparse Attention (MSA), which replaces full attention with KV-block selection to cut per-token compute at long context — roughly 1/20 the cost of the previous generation at 1M tokens, with substantially faster prefill and decode while retaining quality across most tasks.

Trained as a native multimodal model on interleaved data and tuned for multi-turn, production-like collaboration via an interactive user-simulator framework, the model is oriented toward sustained, multi-step tasks rather than single-turn execution.

by minimax1.05M context$0.30/M input tokens$1.20/M output tokens
Favicon for xiaomi

Xiaomi: MiMo-V2.5

2.29T tokens

MiMo-V2.5 is a native omnimodal model by Xiaomi. It delivers Pro-level agentic performance at roughly half the inference cost, while surpassing MiMo-V2-Omni in multimodal perception across image and video understanding tasks. Its 1M context window supports complete documents, extended conversations, and complex task contexts in a single pass, making it ideal for integration with agent frameworks where strong reasoning, rich perception, and cost efficiency all matter.

by xiaomi1.05M context$0.14/M input tokens$0.28/M output tokens
Favicon for openrouter

Owl Alpha

2.03T tokens

Owl Alpha is a high-performance foundation model designed for agentic workloads. Natively supports tool use, and long-context tasks, with strong performance in code generation, automated workflows, and complex instruction execution. Compatible with Claude Code, OpenClaw, and other mainstream productivity tools.

Note: Prompts and completions may be logged by the provider and used to improve the model.

by openrouter1.05M context$0/M input tokens$0/M output tokens
Favicon for anthropic

Anthropic: Claude Sonnet 4.6

1.81T tokens

Sonnet 4.6 is Anthropic's most capable Sonnet-class model yet, with frontier performance across coding, agents, and professional work. It excels at iterative development, complex codebase navigation, end-to-end project management with memory, polished document creation, and confident computer use for web QA and workflow automation.

by anthropic1M context$3/M input tokens$15/M output tokens
Favicon for deepseek

DeepSeek: DeepSeek V4 Pro

1.77T tokens

DeepSeek V4 Pro is a large-scale Mixture-of-Experts model from DeepSeek with 1.6T total parameters and 49B activated parameters, supporting a 1M-token context window. It is designed for advanced reasoning, coding, and long-horizon agent workflows, with strong performance across knowledge, math, and software engineering benchmarks.

Built on the same architecture as DeepSeek V4 Flash, it introduces a hybrid attention system for efficient long-context processing. Reasoning efforts high and xhigh are supported; xhigh maps to max reasoning. It is well suited for complex workloads such as full-codebase analysis, multi-step automation, and large-scale information synthesis, where both capability and efficiency are critical.

by deepseek1.05M context$0.435/M input tokens$0.87/M output tokens
Favicon for anthropic

Anthropic: Claude Opus 4.7

1.51T tokens

Opus 4.7 is the next generation of Anthropic's Opus family, built for long-running, asynchronous agents. Building on the coding and agentic strengths of Opus 4.6, it delivers stronger performance on complex, multi-step tasks and more reliable agentic execution across extended workflows. It is especially effective for asynchronous agent pipelines where tasks unfold over time - large codebases, multi-stage debugging, and end-to-end project orchestration.

Beyond coding, Opus 4.7 brings improved knowledge work capabilities - from drafting documents and building presentations to analyzing data. It maintains coherence across very long outputs and extended sessions, making it a strong default for tasks that require persistence, judgment, and follow-through.

For users upgrading from earlier Opus versions, see our official migration guide here

by anthropic1M context$5/M input tokens$25/M output tokens
Favicon for deepseek

DeepSeek: DeepSeek V3.2

1.21T tokens

DeepSeek-V3.2 is a large language model designed to harmonize high computational efficiency with strong reasoning and agentic tool-use performance. It introduces DeepSeek Sparse Attention (DSA), a fine-grained sparse attention mechanism that reduces training and inference cost while preserving quality in long-context scenarios. A scalable reinforcement learning post-training framework further improves reasoning, with reported performance in the GPT-5 class, and the model has demonstrated gold-medal results on the 2025 IMO and IOI. V3.2 also uses a large-scale agentic task synthesis pipeline to better integrate reasoning into tool-use settings, boosting compliance and generalization in interactive environments.

Users can control the reasoning behaviour with the reasoning enabled boolean. Learn more in our docs

by deepseek131K context$0.2288/M input tokens$0.3432/M output tokens
Favicon for anthropic

Anthropic: Claude Opus 4.8

1.2T tokens

Claude Opus 4.8 is Anthropic's most capable generally available model in the Opus family. It supports text, image, and file inputs with text output, with reasoning support and a 1M-token context window. It is suited for highly autonomous agents, long-horizon agentic work, knowledge work, and memory-driven tasks where coherence over extended sessions matters.

It is particularly strong on multi-step reasoning, complex coding, and end-to-end project orchestration - large codebases, multi-stage debugging, and long-running asynchronous agent pipelines. Beyond coding, it handles knowledge work such as drafting documents, building presentations, and analyzing data, maintaining quality across very long outputs.

by anthropic1M context$5/M input tokens$25/M output tokens
Favicon for google

Google: Gemini 3 Flash Preview

1.08T tokens

Gemini 3 Flash Preview is a high speed, high value thinking model designed for agentic workflows, multi turn chat, and coding assistance. It delivers near Pro level reasoning and tool use performance with substantially lower latency than larger Gemini variants, making it well suited for interactive development, long running agent loops, and collaborative coding tasks. Compared to Gemini 2.5 Flash, it provides broad quality improvements across reasoning, multimodal understanding, and reliability.

The model supports a 1M token context window and multimodal inputs including text, images, audio, video, and PDFs, with text output. It includes configurable reasoning via thinking levels (minimal, low, medium, high), structured output, tool use, and automatic context caching. Gemini 3 Flash Preview is optimized for users who want strong reasoning and agentic behavior without the cost or latency of full scale frontier models.

by google1.05M context$0.50/M input tokens$3/M output tokens$1/M audio tokens