Mistral: Mixtral 8x7B (base)

mistralai/mixtral-8x7b

Created Dec 10, 202332,768 context

$0.54/M input tokens$0.54/M output tokens

Mixtral 8x7B is a pretrained generative Sparse Mixture of Experts, by Mistral AI. Incorporates 8 experts (feed-forward networks) for a total of 47B parameters. Base model (not fine-tuned for instructions) - see Mixtral 8x7B Instruct for an instruct-tuned model.

#moe

Providers for Mixtral 8x7B (base)

OpenRouter routes requests to the top-ranked providers able to handle your prompts.

Together

Context

33K

Max Output

33K

Input

$0.54

Output

$0.54

Latency

0.49s

Throughput

134.8t/s

Apps using Mixtral 8x7B (base)

Top public apps this week using this model

Kaleidoscope: Exoloom

new

2.03M tokens

OpenRouter: Chatroom

Chat with multiple LLMs at once

1.29M tokens

Mantella

new

735K tokens

Talespinner

new

232K tokens

Chub AI

GenAI for everyone

201K tokens

SillyTavern

LLM frontend for power users

196K tokens

APIpie.ai

new

68K tokens

novelcrafter

Your personal novel writing toolbox. Plan, write and tinker with your story.

50K tokens

Msty

new

30K tokens

10.

RisuAI

Browse characters, choose models, and chat

30K tokens

Recent activity on Mixtral 8x7B (base)

Tokens processed per day

Recommended parameters for Mistral: Mixtral 8x7B (base)

Median values from users on OpenRouter

temperature This setting influences the variety in the model's responses. Lower values lead to more predictable and typical responses, while higher values encourage more diverse and less common responses. At 0, the model always gives the same response for a given input. Optional, float, 0.0 to 2.0 Default: 1.0 Explainer Video: Watch p10 1 p50 1 p90 1
top_p This setting limits the model's choices to a percentage of likely tokens: only the top tokens whose probabilities add up to P. A lower value makes the model's responses more predictable, while the default setting allows for a full range of token choices. Think of it like a dynamic Top-K. Optional, float, 0.0 to 1.0 Default: 1.0 Explainer Video: Watch p10 0.99 p50 0.99 p90 1
top_k This limits the model's choice of tokens at each step, making it choose from a smaller set. A value of 1 means the model will always pick the most likely next token, leading to predictable results. By default this setting is disabled, making the model to consider all choices. Optional, integer, 0 or above Default: 0 Explainer Video: Watch p10 0 p50 0 p90 0
frequency_penalty This setting aims to control the repetition of tokens based on how often they appear in the input. It tries to use less frequently those tokens that appear more in the input, proportional to how frequently they occur. Token penalty scales with the number of occurrences. Negative values will encourage token reuse. Optional, float, -2.0 to 2.0 Default: 0.0 Explainer Video: Watch p10 0 p50 0 p90 0
presence_penalty Adjusts how often the model repeats specific tokens already used in the input. Higher values make such repetition less likely, while negative values do the opposite. Token penalty does not scale with the number of occurrences. Negative values will encourage token reuse. Optional, float, -2.0 to 2.0 Default: 0.0 Explainer Video: Watch p10 0 p50 0 p90 0
repetition_penalty Helps to reduce the repetition of tokens from the input. A higher value makes the model less likely to repeat tokens, but too high a value can make the output less coherent (often with run-on sentences that lack small words). Token penalty scales based on original token's probability. Optional, float, 0.0 to 2.0 Default: 1.0 Explainer Video: Watch p10 1 p50 1 p90 1
min_p Represents the minimum probability for a token to be considered, relative to the probability of the most likely token. (The value changes depending on the confidence level of the most probable token.) If your Min-P is set to 0.1, that means it will only allow for tokens that are at least 1/10th as probable as the best possible option. Optional, float, 0.0 to 1.0 Default: 0.0 p10 0 p50 0 p90 0
top_a Consider only the top tokens with "sufficiently high" probabilities based on the probability of the most likely token. Think of it like a dynamic Top-P. A lower Top-A value focuses the choices based on the highest probability token but with a narrower scope. A higher Top-A value does not necessarily affect the creativity of the output, but rather refines the filtering process based on the maximum probability. Optional, float, 0.0 to 1.0 Default: 0.0 p10 0 p50 0 p90 0
temperature This setting influences the variety in the model's responses. Lower values lead to more predictable and typical responses, while higher values encourage more diverse and less common responses. At 0, the model always gives the same response for a given input. Optional, float, 0.0 to 2.0 Default: 1.0 Explainer Video: Watch	p10 1 p50 1 p90 1
top_p This setting limits the model's choices to a percentage of likely tokens: only the top tokens whose probabilities add up to P. A lower value makes the model's responses more predictable, while the default setting allows for a full range of token choices. Think of it like a dynamic Top-K. Optional, float, 0.0 to 1.0 Default: 1.0 Explainer Video: Watch	p10 0.99 p50 0.99 p90 1
top_k This limits the model's choice of tokens at each step, making it choose from a smaller set. A value of 1 means the model will always pick the most likely next token, leading to predictable results. By default this setting is disabled, making the model to consider all choices. Optional, integer, 0 or above Default: 0 Explainer Video: Watch	p10 0 p50 0 p90 0
frequency_penalty This setting aims to control the repetition of tokens based on how often they appear in the input. It tries to use less frequently those tokens that appear more in the input, proportional to how frequently they occur. Token penalty scales with the number of occurrences. Negative values will encourage token reuse. Optional, float, -2.0 to 2.0 Default: 0.0 Explainer Video: Watch	p10 0 p50 0 p90 0
presence_penalty Adjusts how often the model repeats specific tokens already used in the input. Higher values make such repetition less likely, while negative values do the opposite. Token penalty does not scale with the number of occurrences. Negative values will encourage token reuse. Optional, float, -2.0 to 2.0 Default: 0.0 Explainer Video: Watch	p10 0 p50 0 p90 0
repetition_penalty Helps to reduce the repetition of tokens from the input. A higher value makes the model less likely to repeat tokens, but too high a value can make the output less coherent (often with run-on sentences that lack small words). Token penalty scales based on original token's probability. Optional, float, 0.0 to 2.0 Default: 1.0 Explainer Video: Watch	p10 1 p50 1 p90 1
min_p Represents the minimum probability for a token to be considered, relative to the probability of the most likely token. (The value changes depending on the confidence level of the most probable token.) If your Min-P is set to 0.1, that means it will only allow for tokens that are at least 1/10th as probable as the best possible option. Optional, float, 0.0 to 1.0 Default: 0.0	p10 0 p50 0 p90 0
top_a Consider only the top tokens with "sufficiently high" probabilities based on the probability of the most likely token. Think of it like a dynamic Top-P. A lower Top-A value focuses the choices based on the highest probability token but with a narrower scope. A higher Top-A value does not necessarily affect the creativity of the output, but rather refines the filtering process based on the maximum probability. Optional, float, 0.0 to 1.0 Default: 0.0	p10 0 p50 0 p90 0

Sample code using the median

fetch("https://openrouter.ai/api/v1/chat/completions", {
  method: "POST",
  headers: {
    "Authorization": `Bearer ${OPENROUTER_API_KEY}`,
    "HTTP-Referer": `${YOUR_SITE_URL}`, // Optional, for including your app on openrouter.ai rankings.
    "X-Title": `${YOUR_SITE_NAME}`, // Optional. Shows in rankings on openrouter.ai.
    "Content-Type": "application/json"
  },
  body: JSON.stringify({
    "model": "mistralai/mixtral-8x7b",
    "messages": [
      
      {"role": "user", "content": "What is the meaning of life?"},
      
    ],
    "top_p:" 0.99,
    "temperature:" 1,
    "repetition_penalty:" 1,
  })
});

Uptime stats for Mixtral 8x7B (base)

Uptime stats for Mixtral 8x7B (base) across all providers

When an error occurs in an upstream provider, we recover by routing to another healthy provider.
If a model only has one host or the request filters only match a single provider, the request is "irrecoverable."

Learn more about our load balancing and customization options.

Sample code and API for Mixtral 8x7B (base)

OpenRouter normalizes requests and responses across providers for you.

OpenRouter provides an OpenAI-compatible completion API to 291 models & providers that you can call directly, or using the OpenAI SDK. Additionally, some third-party SDKs are available.

In the examples below, the OpenRouter-specific headers are optional. Setting them allows your app to appear on the OpenRouter leaderboards.

Using the OpenAI SDK

import OpenAI from "openai"

const openai = new OpenAI({
  baseURL: "https://openrouter.ai/api/v1",
  apiKey: $OPENROUTER_API_KEY,
  defaultHeaders: {
    "HTTP-Referer": $YOUR_SITE_URL, // Optional, for including your app on openrouter.ai rankings.
    "X-Title": $YOUR_APP_NAME, // Optional. Shows in rankings on openrouter.ai.
  }
})

async function main() {
  const completion = await openai.chat.completions.create({
    model: "mistralai/mixtral-8x7b",
    messages: [
      {
        "role": "user",
        "content": "What is the meaning of life?"
      }
    ]
  })

  console.log(completion.choices[0].message)
}
main()

Using the OpenRouter API directly

fetch("https://openrouter.ai/api/v1/chat/completions", {
  method: "POST",
  headers: {
    "Authorization": `Bearer ${OPENROUTER_API_KEY}`,
    "HTTP-Referer": `${YOUR_SITE_URL}`, // Optional, for including your app on openrouter.ai rankings.
    "X-Title": `${YOUR_SITE_NAME}`, // Optional. Shows in rankings on openrouter.ai.
    "Content-Type": "application/json"
  },
  body: JSON.stringify({
    "model": "mistralai/mixtral-8x7b",
    "messages": [
      {
        "role": "user",
        "content": "What is the meaning of life?"
      }
    ]
  })
});

Using third-party SDKs

For information about using third-party SDKs and frameworks with OpenRouter, please see our frameworks documentation.

See the Request docs for all possible parameters, and Parameters for recommended values.

More models from Mistral AI

Mistral Large 2411

Mistral Large 2 2411 is an update of Mistral Large 2 released together with Pixtral Large 2411

It provides a significant upgrade on the previous Mistral Large 24.07, with notable improvements in long context understanding, a new system prompt, and more accurate function calling.

Mistral Large 2407

This is Mistral AI's flagship model, Mistral Large 2 (version mistral-large-2407). It's a proprietary weights-available model and excels at reasoning, code, JSON, chat, and more. Read the launch announcement here.

It supports dozens of languages including French, German, Spanish, Italian, Portuguese, Arabic, Hindi, Russian, Chinese, Japanese, and Korean, along with 80+ coding languages including Python, Java, C, C++, JavaScript, and Bash. Its long context window allows precise information recall from large documents.

Pixtral Large 2411

Pixtral Large is a 124B parameter, open-weight, multimodal model built on top of Mistral Large 2. The model is able to understand documents, charts and natural images.

The model is available under the Mistral Research License (MRL) for research and educational use, and the Mistral Commercial License for experimentation, testing, and production for commercial purposes.

Ministral 8B

Ministral 8B is an 8B parameter model featuring a unique interleaved sliding-window attention pattern for faster, memory-efficient inference. Designed for edge use cases, it supports up to 128k context length and excels in knowledge and reasoning tasks. It outperforms peers in the sub-10B category, making it perfect for low-latency, privacy-first applications.

Ministral 3B

Ministral 3B is a 3B parameter model optimized for on-device and edge computing. It excels in knowledge, commonsense reasoning, and function-calling, outperforming larger models like Mistral 7B on most benchmarks. Supporting up to 128k context length, it’s ideal for orchestrating agentic workflows and specialist tasks with efficient inference.

Pixtral 12B

The first multi-modal, text+image-to-text model from Mistral AI. Its weights were launched via torrent: https://x.com/mistralai/status/1833758285167722836.

Mistral Nemo

A 12B parameter model with a 128k token context length built by Mistral in collaboration with NVIDIA.

The model is multilingual, supporting English, French, German, Spanish, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, and Hindi.

It supports function calling and is released under the Apache 2.0 license.

Codestral Mamba

A 7.3B parameter Mamba-based model designed for code and reasoning tasks.

Mistral 7B Instruct

A high-performing, industry-standard 7.3B parameter model, with optimizations for speed and context length.

Mistral 7B Instruct has multiple version variants, and this is intended to be the latest version.

Mistral 7B Instruct v0.3

A high-performing, industry-standard 7.3B parameter model, with optimizations for speed and context length.

An improved version of Mistral 7B Instruct v0.2, with the following changes:

NOTE: Support for function calling depends on the provider.

Mixtral 8x22B Instruct

Mistral's official instruct fine-tuned version of Mixtral 8x22B. It uses 39B active parameters out of 141B, offering unparalleled cost efficiency for its size. Its strengths include:

See benchmarks on the launch announcement here. #moe

Mixtral 8x22B

Mixtral 8x22B is a large-scale language model from Mistral AI. It consists of 8 experts, each 22 billion parameters, with each token using 2 experts at a time.

It was released via X.

#moe

Mistral Large

This is Mistral AI's flagship model, Mistral Large 2 (version mistral-large-2407). It's a proprietary weights-available model and excels at reasoning, code, JSON, chat, and more. Read the launch announcement here.

Mistral Small

With 22 billion parameters, Mistral Small v24.09 offers a convenient mid-point between (Mistral NeMo 12B)[/mistralai/mistral-nemo] and (Mistral Large 2)[/mistralai/mistral-large], providing a cost-effective solution that can be deployed across various platforms and environments. It has better reasoning, exhibits more capabilities, can produce and reason about code, and is multiligual, supporting English, French, German, Italian, and Spanish.

Mistral Tiny

This model is currently powered by Mistral-7B-v0.2, and incorporates a "better" fine-tuning than Mistral 7B, inspired by community work. It's best used for large batch processing tasks where cost is a significant factor but reasoning capabilities are not crucial.

Mistral Medium

This is Mistral AI's closed-source, medium-sided model. It's powered by a closed-source prototype and excels at reasoning, code, JSON, chat, and more. In benchmarks, it compares with many of the flagship models of other companies.

Mistral 7B Instruct v0.2

A high-performing, industry-standard 7.3B parameter model, with optimizations for speed and context length.

An improved version of Mistral 7B Instruct, with the following changes:

Mixtral 8x7B Instruct

Mixtral 8x7B Instruct is a pretrained generative Sparse Mixture of Experts, by Mistral AI, for chat and instruction use. Incorporates 8 experts (feed-forward networks) for a total of 47 billion parameters.

Instruct model fine-tuned by Mistral. #moe

Mistral 7B Instruct v0.1

A 7.3B parameter model that outperforms Llama 2 13B on all benchmarks, with optimizations for speed and context length.