Skip to content
  •  
  • © 2023 – 2025 OpenRouter, Inc
      Favicon for Novita

      NovitaAI

      Browse models provided by NovitaAI (Terms of Service)

      49 models

      Tokens processed

      • Qwen: Qwen3 0.6BFree variant

        Qwen3-0.6B is a lightweight, 0.6 billion parameter language model in the Qwen3 series, offering support for both general-purpose dialogue and structured reasoning through a dual-mode (thinking/non-thinking) architecture. Despite its small size, it supports long contexts up to 32,768 tokens and provides multilingual, tool-use, and instruction-following capabilities.

        by qwen32K context$0/M input tokens$0/M output tokens
      • Qwen: Qwen3 1.7BFree variant

        Qwen3-1.7B is a compact, 1.7 billion parameter dense language model from the Qwen3 series, featuring dual-mode operation for both efficient dialogue (non-thinking) and advanced reasoning (thinking). Despite its small size, it supports 32,768-token contexts and delivers strong multilingual, instruction-following, and agentic capabilities, including tool use and structured output.

        by qwen32K context$0/M input tokens$0/M output tokens
      • Qwen: Qwen3 4BFree variant

        Qwen3-4B is a 4 billion parameter dense language model from the Qwen3 series, designed to support both general-purpose and reasoning-intensive tasks. It introduces a dual-mode architecture—thinking and non-thinking—allowing dynamic switching between high-precision logical reasoning and efficient dialogue generation. This makes it well-suited for multi-turn chat, instruction following, and complex agent workflows.

        by qwen128K context$0/M input tokens$0/M output tokens
      • DeepSeek: DeepSeek Prover V2

        DeepSeek Prover V2 is a 671B parameter model, speculated to be geared towards logic and mathematics. Likely an upgrade from DeepSeek-Prover-V1.5 Not much is known about the model yet, as DeepSeek released it on Hugging Face without an announcement or description.

        by deepseek164K context$0.70/M input tokens$2.50/M output tokens
      • Qwen: Qwen3 30B A3B

        Qwen3, the latest generation in the Qwen large language model series, features both dense and mixture-of-experts (MoE) architectures to excel in reasoning, multilingual support, and advanced agent tasks. Its unique ability to switch seamlessly between a thinking mode for complex reasoning and a non-thinking mode for efficient dialogue ensures versatile, high-quality performance. Significantly outperforming prior models like QwQ and Qwen2.5, Qwen3 delivers superior mathematics, coding, commonsense reasoning, creative writing, and interactive dialogue capabilities. The Qwen3-30B-A3B variant includes 30.5 billion parameters (3.3 billion activated), 48 layers, 128 experts (8 activated per task), and supports up to 131K token contexts with YaRN, setting a new standard among open-source models.

        by qwen131K context$0.10/M input tokens$0.45/M output tokens
      • Qwen: Qwen3 8B

        Qwen3-8B is a dense 8.2B parameter causal language model from the Qwen3 series, designed for both reasoning-heavy tasks and efficient dialogue. It supports seamless switching between "thinking" mode for math, coding, and logical inference, and "non-thinking" mode for general conversation. The model is fine-tuned for instruction-following, agent integration, creative writing, and multilingual use across 100+ languages and dialects. It natively supports a 32K token context window and can extend to 131K tokens with YaRN scaling.

        by qwen131K context$0.035/M input tokens$0.138/M output tokens
      • Qwen: Qwen3 14B

        Qwen3-14B is a dense 14.8B parameter causal language model from the Qwen3 series, designed for both complex reasoning and efficient dialogue. It supports seamless switching between a "thinking" mode for tasks like math, programming, and logical inference, and a "non-thinking" mode for general-purpose conversation. The model is fine-tuned for instruction-following, agent tool use, creative writing, and multilingual tasks across 100+ languages and dialects. It natively handles 32K token contexts and can extend to 131K tokens using YaRN-based scaling.

        by qwen132K context$0.07/M input tokens$0.275/M output tokens
      • Qwen: Qwen3 32B

        Qwen3-32B is a dense 32.8B parameter causal language model from the Qwen3 series, optimized for both complex reasoning and efficient dialogue. It supports seamless switching between a "thinking" mode for tasks like math, coding, and logical inference, and a "non-thinking" mode for faster, general-purpose conversation. The model demonstrates strong performance in instruction-following, agent tool use, creative writing, and multilingual tasks across 100+ languages and dialects. It natively handles 32K token contexts and can extend to 131K tokens using YaRN-based scaling.

        by qwen131K context$0.10/M input tokens$0.45/M output tokens
      • Qwen: Qwen3 235B A22B

        Qwen3-235B-A22B is a 235B parameter mixture-of-experts (MoE) model developed by Qwen, activating 22B parameters per forward pass. It supports seamless switching between a "thinking" mode for complex reasoning, math, and code tasks, and a "non-thinking" mode for general conversational efficiency. The model demonstrates strong reasoning ability, multilingual support (100+ languages and dialects), advanced instruction-following, and agent tool-calling capabilities. It natively handles a 32K token context window and extends up to 131K tokens using YaRN-based scaling.

        by qwen131K context$0.20/M input tokens$0.80/M output tokens
      • THUDM: GLM Z1 Rumination 32B

        THUDM: GLM Z1 Rumination 32B is a 32B-parameter deep reasoning model from the GLM-4-Z1 series, optimized for complex, open-ended tasks requiring prolonged deliberation. It builds upon glm-4-32b-0414 with additional reinforcement learning phases and multi-stage alignment strategies, introducing “rumination” capabilities designed to emulate extended cognitive processing. This includes iterative reasoning, multi-hop analysis, and tool-augmented workflows such as search, retrieval, and citation-aware synthesis. The model excels in research-style writing, comparative analysis, and intricate question answering. It supports function calling for search and navigation primitives (search, click, open, finish), enabling use in agent-style pipelines. Rumination behavior is governed by multi-turn loops with rule-based reward shaping and delayed decision mechanisms, benchmarked against Deep Research frameworks such as OpenAI’s internal alignment stacks. This variant is suitable for scenarios requiring depth over speed.

        by thudm32K context$0.24/M input tokens$0.24/M output tokens
      • THUDM: GLM Z1 9BFree variant

        GLM-Z1-9B-0414 is a 9B-parameter language model developed by THUDM as part of the GLM-4 family. It incorporates techniques originally applied to larger GLM-Z1 models, including extended reinforcement learning, pairwise ranking alignment, and training on reasoning-intensive tasks such as mathematics, code, and logic. Despite its smaller size, it demonstrates strong performance on general-purpose reasoning tasks and outperforms many open-source models in its weight class.

        by thudm32K context$0/M input tokens$0/M output tokens
      • THUDM: GLM 4 9BFree variant

        GLM-4-9B-0414 is a 9 billion parameter language model from the GLM-4 series developed by THUDM. Trained using the same reinforcement learning and alignment strategies as its larger 32B counterparts, GLM-4-9B-0414 achieves high performance relative to its size, making it suitable for resource-constrained deployments that still require robust language understanding and generation capabilities.

        by thudm32K context$0/M input tokens$0/M output tokens
      • THUDM: GLM Z1 32B

        GLM-Z1-32B-0414 is an enhanced reasoning variant of GLM-4-32B, built for deep mathematical, logical, and code-oriented problem solving. It applies extended reinforcement learning—both task-specific and general pairwise preference-based—to improve performance on complex multi-step tasks. Compared to the base GLM-4-32B model, Z1 significantly boosts capabilities in structured reasoning and formal domains. The model supports enforced “thinking” steps via prompt engineering and offers improved coherence for long-form outputs. It’s optimized for use in agentic workflows, and includes support for long context (via YaRN), JSON tool calling, and fine-grained sampling configuration for stable inference. Ideal for use cases requiring deliberate, multi-step reasoning or formal derivations.

        by thudm33K context$0.24/M input tokens$0.24/M output tokens
      • THUDM: GLM 4 32B

        GLM-4-32B-0414 is a 32B bilingual (Chinese-English) open-weight language model optimized for code generation, function calling, and agent-style tasks. Pretrained on 15T of high-quality and reasoning-heavy data, it was further refined using human preference alignment, rejection sampling, and reinforcement learning. The model excels in complex reasoning, artifact generation, and structured output tasks, achieving performance comparable to GPT-4o and DeepSeek-V3-0324 across several benchmarks.

        by thudm33K context$0.24/M input tokens$0.24/M output tokens
      • Meta: Llama 4 Maverick

        Llama 4 Maverick 17B Instruct (128E) is a high-capacity multimodal language model from Meta, built on a mixture-of-experts (MoE) architecture with 128 experts and 17 billion active parameters per forward pass (400B total). It supports multilingual text and image input, and produces multilingual text and code output across 12 supported languages. Optimized for vision-language tasks, Maverick is instruction-tuned for assistant-like behavior, image reasoning, and general-purpose multimodal interaction. Maverick features early fusion for native multimodality and a 1 million token context window. It was trained on a curated mixture of public, licensed, and Meta-platform data, covering ~22 trillion tokens, with a knowledge cutoff in August 2024. Released on April 5, 2025 under the Llama 4 Community License, Maverick is suited for research and commercial applications requiring advanced multimodal understanding and high model throughput.

        by meta-llama1.05M context$0.17/M input tokens$0.85/M output tokens$0.668/K input imgs
      • Meta: Llama 4 Scout

        Llama 4 Scout 17B Instruct (16E) is a mixture-of-experts (MoE) language model developed by Meta, activating 17 billion parameters out of a total of 109B. It supports native multimodal input (text and image) and multilingual output (text and code) across 12 supported languages. Designed for assistant-style interaction and visual reasoning, Scout uses 16 experts per forward pass and features a context length of 10 million tokens, with a training corpus of ~40 trillion tokens. Built for high efficiency and local or commercial deployment, Llama 4 Scout incorporates early fusion for seamless modality integration. It is instruction-tuned for use in multilingual chat, captioning, and image understanding tasks. Released under the Llama 4 Community License, it was last trained on data up to August 2024 and launched publicly on April 5, 2025.

        by meta-llama10M context$0.10/M input tokens$0.50/M output tokens$0.334/K input imgs
      • DeepSeek: DeepSeek V3 0324

        DeepSeek V3, a 685B-parameter, mixture-of-experts model, is the latest iteration of the flagship chat model family from the DeepSeek team. It succeeds the DeepSeek V3 model and performs really well on a variety of tasks.

        by deepseek131K context$0.33/M input tokens$1.30/M output tokens
      • Google: Gemma 3 27B

        Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities, including structured outputs and function calling. Gemma 3 27B is Google's latest open source model, successor to Gemma 2

        by google131K context$0.119/M input tokens$0.20/M output tokens
      • Qwen: QwQ 32B

        QwQ is the reasoning model of the Qwen series. Compared with conventional instruction-tuned models, QwQ, which is capable of thinking and reasoning, can achieve significantly enhanced performance in downstream tasks, especially hard problems. QwQ-32B is the medium-sized reasoning model, which is capable of achieving competitive performance against state-of-the-art reasoning models, e.g., DeepSeek-R1, o1-mini.

        by qwen131K context$0.18/M input tokens$0.20/M output tokens
      • DeepSeek: R1 Distill Llama 8B

        DeepSeek R1 Distill Llama 8B is a distilled large language model based on Llama-3.1-8B-Instruct, using outputs from DeepSeek R1. The model combines advanced distillation techniques to achieve high performance across multiple benchmarks, including: - AIME 2024 pass@1: 50.4 - MATH-500 pass@1: 89.1 - CodeForces Rating: 1205 The model leverages fine-tuning from DeepSeek R1's outputs, enabling competitive performance comparable to larger frontier models. Hugging Face: - Llama-3.1-8B - DeepSeek-R1-Distill-Llama-8B |

        by deepseek0 context$0.04/M input tokens$0.04/M output tokens
      • Qwen: Qwen2.5 VL 72B Instruct

        Qwen2.5-VL is proficient in recognizing common objects such as flowers, birds, fish, and insects. It is also highly capable of analyzing texts, charts, icons, graphics, and layouts within images.

        by qwen131K context$0.80/M input tokens$0.80/M output tokens
      • DeepSeek: R1 Distill Qwen 32B

        DeepSeek R1 Distill Qwen 32B is a distilled large language model based on Qwen 2.5 32B, using outputs from DeepSeek R1. It outperforms OpenAI's o1-mini across various benchmarks, achieving new state-of-the-art results for dense models. Other benchmark results include: - AIME 2024 pass@1: 72.6 - MATH-500 pass@1: 94.3 - CodeForces Rating: 1691 The model leverages fine-tuning from DeepSeek R1's outputs, enabling competitive performance comparable to larger frontier models.

        by deepseek128K context$0.30/M input tokens$0.30/M output tokens
      • DeepSeek: R1 Distill Qwen 14B

        DeepSeek R1 Distill Qwen 14B is a distilled large language model based on Qwen 2.5 14B, using outputs from DeepSeek R1. It outperforms OpenAI's o1-mini across various benchmarks, achieving new state-of-the-art results for dense models. Other benchmark results include: - AIME 2024 pass@1: 69.7 - MATH-500 pass@1: 93.9 - CodeForces Rating: 1481 The model leverages fine-tuning from DeepSeek R1's outputs, enabling competitive performance comparable to larger frontier models.

        by deepseek131K context$0.15/M input tokens$0.15/M output tokens
      • DeepSeek: R1 Distill Llama 70B

        DeepSeek R1 Distill Llama 70B is a distilled large language model based on Llama-3.3-70B-Instruct, using outputs from DeepSeek R1. The model combines advanced distillation techniques to achieve high performance across multiple benchmarks, including: - AIME 2024 pass@1: 70.0 - MATH-500 pass@1: 94.5 - CodeForces Rating: 1633 The model leverages fine-tuning from DeepSeek R1's outputs, enabling competitive performance comparable to larger frontier models.

        by deepseek128K context$0.80/M input tokens$0.80/M output tokens
      • DeepSeek: R1

        DeepSeek R1 is here: Performance on par with OpenAI o1, but open-sourced and with fully open reasoning tokens. It's 671B parameters in size, with 37B active in an inference pass. Fully open-source model & technical report. MIT licensed: Distill & commercialize freely!

        by deepseek164K context$0.70/M input tokens$2.50/M output tokens
      • DeepSeek: DeepSeek V3

        DeepSeek-V3 is the latest model from the DeepSeek team, building upon the instruction following and coding abilities of the previous versions. Pre-trained on nearly 15 trillion tokens, the reported evaluations reveal that the model outperforms other open-source models and rivals leading closed-source models. For model details, please visit the DeepSeek-V3 repo for more information, or see the launch announcement.

        by deepseek131K context$0.40/M input tokens$1.30/M output tokens
      • Meta: Llama 3.3 70B Instruct

        The Meta Llama 3.3 multilingual large language model (LLM) is a pretrained and instruction tuned generative model in 70B (text in/text out). The Llama 3.3 instruction tuned text only model is optimized for multilingual dialogue use cases and outperforms many of the available open source and closed chat models on common industry benchmarks. Supported languages: English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai. Model Card

        by meta-llama131K context$0.13/M input tokens$0.39/M output tokens
      • Qwen2.5 7B InstructFree variant

        Qwen2.5 7B is the latest series of Qwen large language models. Qwen2.5 brings the following improvements upon Qwen2: - Significantly more knowledge and has greatly improved capabilities in coding and mathematics, thanks to our specialized expert models in these domains. - Significant improvements in instruction following, generating long texts (over 8K tokens), understanding structured data (e.g, tables), and generating structured outputs especially JSON. More resilient to the diversity of system prompts, enhancing role-play implementation and condition-setting for chatbots. - Long-context Support up to 128K tokens and can generate up to 8K tokens. - Multilingual support for over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, Arabic, and more. Usage of this model is subject to Tongyi Qianwen LICENSE AGREEMENT.

        by qwen131K context$0/M input tokens$0/M output tokens
      • Meta: Llama 3.2 11B Vision Instruct

        Llama 3.2 11B Vision is a multimodal model with 11 billion parameters, designed to handle tasks combining visual and textual data. It excels in tasks such as image captioning and visual question answering, bridging the gap between language generation and visual reasoning. Pre-trained on a massive dataset of image-text pairs, it performs well in complex, high-accuracy image analysis. Its ability to integrate visual understanding with language processing makes it an ideal solution for industries requiring comprehensive visual-linguistic AI applications, such as content creation, AI-driven customer service, and research. Click here for the original model card. Usage of this model is subject to Meta's Acceptable Use Policy.

        by meta-llama131K context$0.06/M input tokens$0.06/M output tokens
      • Meta: Llama 3.2 1B InstructFree variant

        Llama 3.2 1B is a 1-billion-parameter language model focused on efficiently performing natural language tasks, such as summarization, dialogue, and multilingual text analysis. Its smaller size allows it to operate efficiently in low-resource environments while maintaining strong task performance. Supporting eight core languages and fine-tunable for more, Llama 1.3B is ideal for businesses or developers seeking lightweight yet powerful AI solutions that can operate in diverse multilingual settings without the high computational demand of larger models. Click here for the original model card. Usage of this model is subject to Meta's Acceptable Use Policy.

        by meta-llama131K context$0/M input tokens$0/M output tokens
      • Meta: Llama 3.2 3B Instruct

        Llama 3.2 3B is a 3-billion-parameter multilingual large language model, optimized for advanced natural language processing tasks like dialogue generation, reasoning, and summarization. Designed with the latest transformer architecture, it supports eight languages, including English, Spanish, and Hindi, and is adaptable for additional languages. Trained on 9 trillion tokens, the Llama 3.2 3B model excels in instruction-following, complex reasoning, and tool use. Its balanced performance makes it ideal for applications needing accuracy and efficiency in text generation across multilingual settings. Click here for the original model card. Usage of this model is subject to Meta's Acceptable Use Policy.

        by meta-llama131K context$0.03/M input tokens$0.05/M output tokens
      • Qwen2.5 72B Instruct

        Qwen2.5 72B is the latest series of Qwen large language models. Qwen2.5 brings the following improvements upon Qwen2: - Significantly more knowledge and has greatly improved capabilities in coding and mathematics, thanks to our specialized expert models in these domains. - Significant improvements in instruction following, generating long texts (over 8K tokens), understanding structured data (e.g, tables), and generating structured outputs especially JSON. More resilient to the diversity of system prompts, enhancing role-play implementation and condition-setting for chatbots. - Long-context Support up to 128K tokens and can generate up to 8K tokens. - Multilingual support for over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, Arabic, and more. Usage of this model is subject to Tongyi Qianwen LICENSE AGREEMENT.

        by qwen131K context$0.38/M input tokens$0.40/M output tokens
      • Sao10K: Llama 3.1 Euryale 70B v2.2

        Euryale L3.1 70B v2.2 is a model focused on creative roleplay from Sao10k. It is the successor of Euryale L3 70B v2.1.

        by sao10k131K context$1.48/M input tokens$1.48/M output tokens
      • Sao10K: Llama 3 8B Lunaris

        Lunaris 8B is a versatile generalist and roleplaying model based on Llama 3. It's a strategic merge of multiple models, designed to balance creativity with improved logic and general knowledge. Created by Sao10k, this model aims to offer an improved experience over Stheno v3.2, with enhanced creativity and logical reasoning. For best results, use with Llama 3 Instruct context template, temperature 1.4, and min_p 0.1.

        by sao10k8K context$0.05/M input tokens$0.05/M output tokens
      • Meta: Llama 3.1 70B Instruct

        Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This 70B instruct-tuned version is optimized for high quality dialogue usecases. It has demonstrated strong performance compared to leading closed-source models in human evaluations. To read more about the model release, click here. Usage of this model is subject to Meta's Acceptable Use Policy.

        by meta-llama131K context$0.119/M input tokens$0.39/M output tokens
      • Meta: Llama 3.1 8B Instruct

        Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This 8B instruct-tuned version is fast and efficient. It has demonstrated strong performance compared to leading closed-source models in human evaluations. To read more about the model release, click here. Usage of this model is subject to Meta's Acceptable Use Policy.

        by meta-llama131K context$0.02/M input tokens$0.05/M output tokens
      • Mistral: Mistral Nemo

        A 12B parameter model with a 128k token context length built by Mistral in collaboration with NVIDIA. The model is multilingual, supporting English, French, German, Spanish, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, and Hindi. It supports function calling and is released under the Apache 2.0 license.

        by mistralai131K context$0.04/M input tokens$0.17/M output tokens
      • Google: Gemma 2 9B

        Gemma 2 9B by Google is an advanced, open-source language model that sets a new standard for efficiency and performance in its size class. Designed for a wide variety of tasks, it empowers developers and researchers to build innovative applications, while maintaining accessibility, safety, and cost-effectiveness. See the launch announcement for more details. Usage of Gemma is subject to Google's Gemma Terms of Use.

        by google8K context$0.08/M input tokens$0.08/M output tokens
      • Sao10k: Llama 3 Euryale 70B v2.1

        Euryale 70B v2.1 is a model focused on creative roleplay from Sao10k. - Better prompt adherence. - Better anatomy / spatial awareness. - Adapts much better to unique and custom formatting / reply formats. - Very creative, lots of unique swipes. - Is not restrictive during roleplays.

        by sao10k8K context$1.48/M input tokens$1.48/M output tokens
      • Dolphin 2.9.2 Mixtral 8x22B 🐬

        Dolphin 2.9 is designed for instruction following, conversational, and coding. This model is a finetune of Mixtral 8x22B Instruct. It features a 64k context length and was fine-tuned with a 16k sequence length using ChatML templates. This model is a successor to Dolphin Mixtral 8x7B. The model is uncensored and is stripped of alignment and bias. It requires an external alignment layer for ethical use. Users are cautioned to use this highly compliant model responsibly, as detailed in a blog post about uncensored models at erichartford.com/uncensored-models. #moe #uncensored

        by cognitivecomputations66K context$0.90/M input tokens$0.90/M output tokens
      • Mistral: Mistral 7B Instruct

        A high-performing, industry-standard 7.3B parameter model, with optimizations for speed and context length. Mistral 7B Instruct has multiple version variants, and this is intended to be the latest version.

        by mistralai33K context$0.029/M input tokens$0.059/M output tokens
      • Mistral: Mistral 7B Instruct v0.3

        A high-performing, industry-standard 7.3B parameter model, with optimizations for speed and context length. An improved version of Mistral 7B Instruct v0.2, with the following changes: - Extended vocabulary to 32768 - Supports v3 Tokenizer - Supports function calling NOTE: Support for function calling depends on the provider.

        by mistralai33K context$0.029/M input tokens$0.059/M output tokens
      • NousResearch: Hermes 2 Pro - Llama-3 8B

        Hermes 2 Pro is an upgraded, retrained version of Nous Hermes 2, consisting of an updated and cleaned version of the OpenHermes 2.5 Dataset, as well as a newly introduced Function Calling and JSON Mode dataset developed in-house.

        by nousresearch8K context$0.14/M input tokens$0.14/M output tokens
      • Meta: Llama 3 8B Instruct

        Meta's latest class of model (Llama 3) launched with a variety of sizes & flavors. This 8B instruct-tuned version was optimized for high quality dialogue usecases. It has demonstrated strong performance compared to leading closed-source models in human evaluations. To read more about the model release, click here. Usage of this model is subject to Meta's Acceptable Use Policy.

        by meta-llama8K context$0.04/M input tokens$0.04/M output tokens
      • Meta: Llama 3 70B Instruct

        Meta's latest class of model (Llama 3) launched with a variety of sizes & flavors. This 70B instruct-tuned version was optimized for high quality dialogue usecases. It has demonstrated strong performance compared to leading closed-source models in human evaluations. To read more about the model release, click here. Usage of this model is subject to Meta's Acceptable Use Policy.

        by meta-llama8K context$0.51/M input tokens$0.74/M output tokens
      • WizardLM-2 8x22B

        WizardLM-2 8x22B is Microsoft AI's most advanced Wizard model. It demonstrates highly competitive performance compared to leading proprietary models, and it consistently outperforms all existing state-of-the-art opensource models. It is an instruct finetune of Mixtral 8x22B. To read more about the model release, click here. #moe

        by microsoft66K context$0.62/M input tokens$0.62/M output tokens
      • Midnight Rose 70B

        A merge with a complex family tree, this model was crafted for roleplaying and storytelling. Midnight Rose is a successor to Rogue Rose and Aurora Nights and improves upon them both. It wants to produce lengthy output by default and is the best creative writing merge produced so far by sophosympatheia. Descending from earlier versions of Midnight Rose and Wizard Tulu Dolphin 70B, it inherits the best qualities of each.

        by sophosympatheia4K context$0.80/M input tokens$0.80/M output tokens
      • Airoboros 70B

        A Llama 2 70B fine-tune using synthetic data (the Airoboros dataset). Currently based on jondurbin/airoboros-l2-70b, but might get updated in the future.

        by jondurbin4K context$0.50/M input tokens$0.50/M output tokens
      • MythoMax 13B

        One of the highest performing and most popular fine-tunes of Llama 2 13B, with rich descriptions and roleplay. #merge

        by gryphe4K context$0.09/M input tokens$0.09/M output tokens