- Mistral: Mixtral 8x22B Instruct
Mistral's official instruct fine-tuned version of Mixtral 8x22B. It uses 39B active parameters out of 141B, offering unparalleled cost efficiency for its size. Its strengths include: - strong math, coding, and reasoning - large context length (64k) - fluency in English, French, Italian, German, and Spanish See benchmarks on the launch announcement here. #moe
by mistralai66K context$0.65/M input tkns$0.65/M output tkns218M tokens this week - WizardLM-2 8x22B (nitro)
WizardLM-2 8x22B is Microsoft AI's most advanced Wizard model. It demonstrates highly competitive performance compared to leading proprietary models, and it consistently outperforms all existing state-of-the-art opensource models. It is an instruct finetune of Mixtral 8x22B. To read more about the model release, click here. #moe Note: this is a higher-throughput version of this model, and may have higher prices and slightly different outputs.
by microsoft66K context$1/M input tkns$1/M output tkns133M tokens this week - WizardLM-2 8x22B
WizardLM-2 8x22B is Microsoft AI's most advanced Wizard model. It demonstrates highly competitive performance compared to leading proprietary models, and it consistently outperforms all existing state-of-the-art opensource models. It is an instruct finetune of Mixtral 8x22B. To read more about the model release, click here. #moe
by microsoft66K context$0.65/M input tkns$0.65/M output tkns735M tokens this week - Mistral: Mixtral 8x22B (base)
Mixtral 8x22B is a large-scale language model from Mistral AI. It consists of 8 experts, each 22 billion parameters, with each token using 2 experts at a time. It was released via X. #moe
by mistralai66K context$0.9/M input tkns$0.9/M output tkns13.1M tokens this week