- Mistral: Mixtral 8x22B Instruct
Mistral's official instruct fine-tuned version of Mixtral 8x22B. It uses 39B active parameters out of 141B, offering unparalleled cost efficiency for its size. Its strengths include: - strong math, coding, and reasoning - large context length (64k) - fluency in English, French, Italian, German, and Spanish See benchmarks on the launch announcement here. #moe
by mistralai66K context$0.65/M input tkns$0.65/M output tkns159M tokens this week - WizardLM-2 8x22B
WizardLM-2 8x22B is Microsoft AI's most advanced Wizard model. It demonstrates highly competitive performance compared to leading proprietary models, and it consistently outperforms all existing state-of-the-art opensource models. It is an instruct finetune of Mixtral 8x22B. To read more about the model release, click here. #moe
by microsoft66K context$0.65/M input tkns$0.65/M output tkns679M tokens this week - WizardLM-2 7B
WizardLM-2 7B is the smaller variant of Microsoft AI's latest Wizard model. It is the fastest and achieves comparable performance with existing 10x larger opensource leading models It is a finetune of Mistral 7B Instruct, using the same technique as WizardLM-2 8x22B. To read more about the model release, click here. #moe
by microsoft32K context$0.07/M input tkns$0.07/M output tkns268M tokens this week - Mistral: Mixtral 8x22B (base)
Mixtral 8x22B is a large-scale language model from Mistral AI. It consists of 8 experts, each 22 billion parameters, with each token using 2 experts at a time. It was released via X. #moe
by mistralai66K context$0.9/M input tkns$0.9/M output tkns15.4M tokens this week - Databricks: DBRX 132B Instruct
DBRX is a new open source large language model developed by Databricks. At 132B, it outperforms existing open source LLMs like Llama 2 70B and Mixtral-8x7b on standard industry benchmarks for language understanding, programming, math, and logic. It uses a fine-grained mixture-of-experts (MoE) architecture. 36B parameters are active on any input. It was pre-trained on 12T tokens of text and code data. Compared to other open MoE models like Mixtral-8x7B and Grok-1, DBRX is fine-grained, meaning it uses a larger number of smaller experts. See the launch announcement and benchmark results here. #moe
by databricks33K context$0.6/M input tkns$0.6/M output tkns3.17M tokens this week - Mistral 7B Instruct (nitro)
A 7.3B parameter model that outperforms Llama 2 13B on all benchmarks, with optimizations for speed and context length. This is v0.2 of Mistral 7B Instruct. For v0.1, use this model. Note: this is a higher-throughput version of this model, and may have higher prices and slightly different outputs.
by mistralai33K context$0.2/M input tkns$0.2/M output tkns74.8M tokens this week - Mixtral 8x7B Instruct (nitro)
A pretrained generative Sparse Mixture of Experts, by Mistral AI, for chat and instruction use. Incorporates 8 experts (feed-forward networks) for a total of 47 billion parameters. Instruct model fine-tuned by Mistral. #moe Note: this is a higher-throughput version of this model, and may have higher prices and slightly different outputs.
by mistralai33K context$0.54/M input tkns$0.54/M output tkns146M tokens this week - Mistral Large
This is Mistral AI's closed-source, flagship model. It's powered by a closed-source prototype and excels at reasoning, code, JSON, chat, and more. Read the launch announcement here. It is fluent in English, French, Spanish, German, and Italian, with high grammatical accuracy, and its 32K tokens context window allows precise information recall from large documents.
by mistralai32K context$8/M input tkns$24/M output tkns25.6M tokens this week - Nous: Hermes 2 Mixtral 8x7B SFT
Nous Hermes 2 Mixtral 8x7B SFT is the supervised finetune only version of the Nous Research model trained over the Mixtral 8x7B MoE LLM. The model was trained on over 1,000,000 entries of primarily GPT-4 generated data, as well as other high quality data from open datasets across the AI landscape, achieving state of the art performance on a variety of tasks. #moe
by nousresearch33K context$0.54/M input tkns$0.54/M output tkns3.6M tokens this week - Nous: Hermes 2 Mixtral 8x7B DPO
Nous Hermes 2 Mixtral 8x7B DPO is the new flagship Nous Research model trained over the Mixtral 8x7B MoE LLM. The model was trained on over 1,000,000 entries of primarily GPT-4 generated data, as well as other high quality data from open datasets across the AI landscape, achieving state of the art performance on a variety of tasks. #moe
by nousresearch33K context$0.27/M input tkns$0.27/M output tkns36.2M tokens this week - Mistral Tiny
This model is currently powered by Mistral-7B-v0.2, and incorporates a "better" fine-tuning than Mistral 7B, inspired by community work. It's best used for large batch processing tasks where cost is a significant factor but reasoning capabilities are not crucial.
by mistralai32K context$0.25/M input tkns$0.25/M output tkns1.01B tokens this week - Mistral Small
This model is currently powered by Mixtral-8X7B-v0.1, a sparse mixture of experts model with 12B active parameters. It has better reasoning, exhibits more capabilities, can produce and reason about code, and is multiligual, supporting English, French, German, Italian, and Spanish. #moe
by mistralai32K context$2/M input tkns$6/M output tkns10.8M tokens this week - Mistral Medium
This is Mistral AI's closed-source, medium-sided model. It's powered by a closed-source prototype and excels at reasoning, code, JSON, chat, and more. In benchmarks, it compares with many of the flagship models of other companies.
by mistralai32K context$2.7/M input tkns$8.1/M output tkns71.9M tokens this week - Dolphin 2.6 Mixtral 8x7B 🐬
This is a 16k context fine-tune of Mixtral-8x7b. It excels in coding tasks due to extensive training with coding data and is known for its obedience, although it lacks DPO tuning. The model is uncensored and is stripped of alignment and bias. It requires an external alignment layer for ethical use. Users are cautioned to use this highly compliant model responsibly, as detailed in a blog post about uncensored models at erichartford.com/uncensored-models. #moe #uncensored
by cognitivecomputations33K context$0.5/M input tkns$0.5/M output tkns30.4M tokens this week - Mixtral 8x7B (base)
A pretrained generative Sparse Mixture of Experts, by Mistral AI. Incorporates 8 experts (feed-forward networks) for a total of 47B parameters. Base model (not fine-tuned for instructions) - see Mixtral 8x7B Instruct for an instruct-tuned model. #moe
by mistralai33K context$0.54/M input tkns$0.54/M output tkns824K tokens this week - Mixtral 8x7B Instruct
A pretrained generative Sparse Mixture of Experts, by Mistral AI, for chat and instruction use. Incorporates 8 experts (feed-forward networks) for a total of 47 billion parameters. Instruct model fine-tuned by Mistral. #moe
by mistralai33K context$0.24/M input tkns$0.24/M output tkns2.2B tokens this week - Nous: Capybara 7B (free)
The Capybara series is a collection of datasets and models made by fine-tuning on data created by Nous, mostly in-house. V1.9 uses unalignment techniques for more consistent and dynamic control. It also leverages a significantly better foundation model, Mistral 7B. Note: this is a free, rate-limited version of this model. Outputs may be cached. Read about rate limits here.
by nousresearch4K context$0/M input tkns$0/M output tkns6.81M tokens this week - Nous: Capybara 7B
The Capybara series is a collection of datasets and models made by fine-tuning on data created by Nous, mostly in-house. V1.9 uses unalignment techniques for more consistent and dynamic control. It also leverages a significantly better foundation model, Mistral 7B.
by nousresearch4K context$0.18/M input tkns$0.18/M output tkns408K tokens this week - Neural Chat 7B v3.1
A fine-tuned model based on mistralai/Mistral-7B-v0.1 on the open source dataset Open-Orca/SlimOrca, aligned with DPO algorithm. For more details, refer to the blog: The Practice of Supervised Fine-tuning and Direct Preference Optimization on Habana Gaudi2.
by intel4K context$5/M input tkns$5/M output tkns75K tokens this week - Auto (best for prompt)
Depending on their size, subject, and complexity, your prompts will be sent to Mistral Large, Claude 3 Sonnet or GPT-4o. To see which model was used, visit Activity.
by openrouter200K context - Hugging Face: Zephyr 7B
Zephyr is a series of language models that are trained to act as helpful assistants. Zephyr-7B-β is the second model in the series, and is a fine-tuned version of mistralai/Mistral-7B-v0.1 that was trained on a mix of publicly available, synthetic datasets using Direct Preference Optimization (DPO).
by huggingfaceh44K context$0.2/M input tkns$0.2/M output tkns4.56M tokens this week - Hugging Face: Zephyr 7B (free)
Zephyr is a series of language models that are trained to act as helpful assistants. Zephyr-7B-β is the second model in the series, and is a fine-tuned version of mistralai/Mistral-7B-v0.1 that was trained on a mix of publicly available, synthetic datasets using Direct Preference Optimization (DPO). Note: this is a free, rate-limited version of this model. Outputs may be cached. Read about rate limits here.
by huggingfaceh44K context$0/M input tkns$0/M output tkns1.29M tokens this week - Mistral 7B Instruct (free)
A 7.3B parameter model that outperforms Llama 2 13B on all benchmarks, with optimizations for speed and context length. This is v0.1 of Mistral 7B Instruct. For v0.2, use this model. Note: this is a free, rate-limited version of this model. Outputs may be cached. Read about rate limits here.
by mistralai33K context$0/M input tkns$0/M output tkns44.6M tokens this week - Mistral 7B Instruct
A 7.3B parameter model that outperforms Llama 2 13B on all benchmarks, with optimizations for speed and context length. This is v0.1 of Mistral 7B Instruct. For v0.2, use this model.
by mistralai33K context$0.07/M input tkns$0.07/M output tkns1.47B tokens this week