DeepSeek V4 Is Earning Agentic Token Share

OpenRouter ·6/30/2026

DeepSeek, the company that for many is still synonymous with open source LLMs, released its new flagship V4 models on April 24th.

V4 reset the trajectory. In six months DeepSeek roughly doubled its share of tokens on OpenRouter, and agentic workloads drove most of that gain.

Model authors driving token usage today

Model authors driving token usage

DeepSeek began the year holding just under 10% of the token flow weekly across OpenRouter. As agentic work began to take hold in February and March, driving up token usage at unprecedented rates, DeepSeek actually fell to 5% of total tokens. The company was being squeezed from above by proprietary models and below from a wave of other open source LLMs.

The release of the V4 model family had an immediate positive impact. By the start of June, DeepSeek had earned nearly 20% of token share and has been the top model on OpenRouter since mid-May.

A direct comparison between January and June 2026 shows just how quickly preferences can shift between model authors. DeepSeek effectively doubled its token share over the period (from 9% to 18%) but many other model providers experienced major swings in fortune.

A group of Chinese open source models, including Xiaomi, Minimax, and Tencent, all saw their share of tokens rise over the past 6 months. This seems to have come at the expense of a couple of the leading American model companies, specifically Google and OpenAI.

DeepSeek token usage by user type

The DeepSeek doubling was not restricted to a single user type. Companies and individuals of all stripes were using more and more DeepSeek tokens by the start of June. Hobbyist users (who typically have significant usage through consumer-oriented app categories like roleplay, general chat, personal agents, etc) now route nearly a third of all their tokens to DeepSeek models.

Even amongst users at AI native companies or large organizations, both of whom may have been expected to prioritize frontier closed models, sent DeepSeek much more token traffic in early June than they did to start the year.

DeepSeek V4 pricing comparison

DeepSeek V4 Flash, on the cheapest endpoint, costs $0.09 input / $0.18 output per million tokens. For comparison, GPT-5.5 is currently priced at $5 input / $30 output per million tokens.

So it’s not a shock to see this sudden upsurge in token volume for V4 has not resulted in an identical spike in share of spend. The cost effectiveness to output quality ratio for V4 is best in class - it is good enough, in fact, that organizations of all sizes have begun trusting DeepSeek with real agentic work.

V4 is the first DeepSeek model sufficient for agentic workloads

Agentic workloads token breakdown

We segment token traffic at the API key level into three main categories (Agentic, Mixed, and Human). These are assigned according to a 7-signal weighted composite score that includes inputs such as tool call rate, turn count, gap timing, and others.

Everyone knows that agentic work burns more tokens than normal human AI usage (about 15x more per request, according to OpenRouter data). But it bears remembering that this explosion in agentic tokens is a 2026 phenomenon and really didn’t kick off in earnest until early February of this year.

Except even that is an overstatement for some model authors.

Agentic vs human token usage over time

The tokens used by agentic workloads surpassed those used by humans right around February 1st. At first, agentic work was concentrated in only a few model authors and that select group did not include DeepSeek.

The release of V4 marked DeepSeek’s entrance into the agentic workflow race in earnest.

V4 Flash agentic token flow

Not all workloads switch to the latest model at the same time. The chart above illustrates how human activity within DeepSeek token usage has remained tied to DeepSeek V3.2 through the first 5 months of the year.

Agentic workloads, however, have made V4 their model of choice. By the end of May, a mere month after release, V4-Flash comprised 70% of agentic token flow for DeepSeek usage.

DeepSeek and the rise of Chinese models

US vs China token volume

Tokens flow to the LLMs created in two countries: America and China (with a tiny smattering of others represented in the small beige slice). Token volume has exploded in 2026 but the growth has not spread to LLMs from other nations.

2025 was the year of American tokens, with models from the US responsible for about 3/4ths of the tokens used. The competition has been far more fierce in 2026, with Chinese models actually surpassing American ones in token share as of early June.

Chinese models token surge

And to no surprise, it is DeepSeek leading the charge among Chinese models since the release of V4 in late April. While token usage amongst a few of the American leaders has plateaued over the past 6 weeks, multiple Chinese competitors saw tokens surge as June began.

With cost effective token usage dominating the summer headlines, we’d expect these cheaper-but-still-powerful open source models to continue to attract more tokens over time. As the Wall Street Journal mentions in this recent story (featuring OpenRouter data), “Startups and tech giants alike are mixing and matching AI models to avoid the premium prices”.

DeepSeek V4 is certain to be part of that mix.

Methodology Notes

A few notes on how we put these numbers together:

Source: OpenRouter’s request logs.
Sample size: Over 450 trillion tokens from January 1, 2026 - June 14, 2026.
Share metric: All “share” figures refer to share of token volume (input and output combined), not share of spend. As the pricing section notes, the two can diverge sharply for low-cost models like V4.
Activity type: We split token traffic at the API key level into three main categories (Agentic, Mixed, and Human). These are assigned according to a 7-signal weighted composite score that includes inputs such as tool call rate, turn count, gap timing, and others. Because the split happens at the key level, a single key doing mixed work is classified by its dominant pattern rather than per request.