Skip to content
  1.  
  2. © 2023 – 2025 OpenRouter, Inc
    Favicon for NextBit

    NextBit

    Browse models provided by NextBit (Terms of Service)

    18 models

    Tokens processed on OpenRouter

    • OpenAI: gpt-oss-20bgpt-oss-20b

      gpt-oss-20b is an open-weight 21B parameter model released by OpenAI under the Apache 2.0 license. It uses a Mixture-of-Experts (MoE) architecture with 3.6B active parameters per forward pass, optimized for lower-latency inference and deployability on consumer or single-GPU hardware. The model is trained in OpenAI’s Harmony response format and supports reasoning level configuration, fine-tuning, and agentic capabilities including function calling, tool use, and structured outputs.

      by openai131K context$0.10/M input tokens$0.45/M output tokens
  3. Qwen: Qwen3 30B A3BQwen3 30B A3B

    Qwen3, the latest generation in the Qwen large language model series, features both dense and mixture-of-experts (MoE) architectures to excel in reasoning, multilingual support, and advanced agent tasks. Its unique ability to switch seamlessly between a thinking mode for complex reasoning and a non-thinking mode for efficient dialogue ensures versatile, high-quality performance. Significantly outperforming prior models like QwQ and Qwen2.5, Qwen3 delivers superior mathematics, coding, commonsense reasoning, creative writing, and interactive dialogue capabilities. The Qwen3-30B-A3B variant includes 30.5 billion parameters (3.3 billion activated), 48 layers, 128 experts (8 activated per task), and supports up to 131K token contexts with YaRN, setting a new standard among open-source models.

    by qwen131K context$0.14/M input tokens$0.55/M output tokens
  4. Qwen: Qwen3 14BQwen3 14B

    Qwen3-14B is a dense 14.8B parameter causal language model from the Qwen3 series, designed for both complex reasoning and efficient dialogue. It supports seamless switching between a "thinking" mode for tasks like math, programming, and logical inference, and a "non-thinking" mode for general-purpose conversation. The model is fine-tuned for instruction-following, agent tool use, creative writing, and multilingual tasks across 100+ languages and dialects. It natively handles 32K token contexts and can extend to 131K tokens using YaRN-based scaling.

    by qwen132K context$0.06/M input tokens$0.24/M output tokens
  5. Qwen: QwQ 32BQwQ 32B

    QwQ is the reasoning model of the Qwen series. Compared with conventional instruction-tuned models, QwQ, which is capable of thinking and reasoning, can achieve significantly enhanced performance in downstream tasks, especially hard problems. QwQ-32B is the medium-sized reasoning model, which is capable of achieving competitive performance against state-of-the-art reasoning models, e.g., DeepSeek-R1, o1-mini.

    by qwen131K context$0.15/M input tokens$0.40/M output tokens
  6. DeepSeek: R1 Distill Qwen 32BR1 Distill Qwen 32B

    DeepSeek R1 Distill Qwen 32B is a distilled large language model based on Qwen 2.5 32B, using outputs from DeepSeek R1. It outperforms OpenAI's o1-mini across various benchmarks, achieving new state-of-the-art results for dense models.\n\nOther benchmark results include:\n\n- AIME 2024 pass@1: 72.6\n- MATH-500 pass@1: 94.3\n- CodeForces Rating: 1691\n\nThe model leverages fine-tuning from DeepSeek R1's outputs, enabling competitive performance comparable to larger frontier models.

    by deepseek128K context$0.29/M input tokens$0.29/M output tokens
  7. Microsoft: Phi 4Phi 4

    Microsoft Research Phi-4 is designed to perform well in complex reasoning tasks and can operate efficiently in situations with limited memory or where quick responses are needed. At 14 billion parameters, it was trained on a mix of high-quality synthetic datasets, data from curated websites, and academic materials. It has undergone careful improvement to follow instructions accurately and maintain strong safety standards. It works best with English language inputs. For more information, please see Phi-4 Technical Report

    by microsoft16K context$0.06/M input tokens$0.14/M output tokens
  8. Sao10K: Llama 3.3 Euryale 70BLlama 3.3 Euryale 70B

    Euryale L3.3 70B is a model focused on creative roleplay from Sao10k. It is the successor of Euryale L3 70B v2.2.

    by sao10k8K context$0.65/M input tokens$0.75/M output tokens
  9. TheDrummer: UnslopNemo 12BUnslopNemo 12B

    UnslopNemo v4.1 is the latest addition from the creator of Rocinante, designed for adventure writing and role-play scenarios.

    by thedrummer32K context$0.40/M input tokens$0.40/M output tokens
  10. TheDrummer: Rocinante 12BRocinante 12B

    Rocinante 12B is designed for engaging storytelling and rich prose. Early testers have reported: - Expanded vocabulary with unique and expressive word choices - Enhanced creativity for vivid narratives - Adventure-filled and captivating stories

    by thedrummer33K context$0.17/M input tokens$0.43/M output tokens
  11. NeverSleep: Lumimaid v0.2 8BLumimaid v0.2 8B

    Lumimaid v0.2 8B is a finetune of Llama 3.1 8B with a "HUGE step up dataset wise" compared to Lumimaid v0.1. Sloppy chats output were purged. Usage of this model is subject to Meta's Acceptable Use Policy.

    by neversleep131K context$0.09/M input tokens$0.60/M output tokens
  12. Sao10K: Llama 3.1 Euryale 70B v2.2Llama 3.1 Euryale 70B v2.2

    Euryale L3.1 70B v2.2 is a model focused on creative roleplay from Sao10k. It is the successor of Euryale L3 70B v2.1.

    by sao10k131K context$0.65/M input tokens$0.75/M output tokens
  13. Nous: Hermes 3 70B InstructHermes 3 70B Instruct

    Hermes 3 is a generalist language model with many improvements over Hermes 2, including advanced agentic capabilities, much better roleplaying, reasoning, multi-turn conversation, long context coherence, and improvements across the board. Hermes 3 70B is a competitive, if not superior finetune of the Llama-3.1 70B foundation model, focused on aligning LLMs to the user, with powerful steering capabilities and control given to the end user. The Hermes 3 series builds and expands on the Hermes 2 set of capabilities, including more powerful and reliable function calling and structured output capabilities, generalist assistant capabilities, and improved code generation skills.

    by nousresearch131K context$0.30/M input tokens$0.30/M output tokens
  14. Google: Gemma 2 27BGemma 2 27B

    Gemma 2 27B by Google is an open model built from the same research and technology used to create the Gemini models. Gemma models are well-suited for a variety of text generation tasks, including question answering, summarization, and reasoning. See the launch announcement for more details. Usage of Gemma is subject to Google's Gemma Terms of Use.

    by google8K context$0.65/M input tokens$0.65/M output tokens
  15. NousResearch: Hermes 2 Pro - Llama-3 8BHermes 2 Pro - Llama-3 8B

    Hermes 2 Pro is an upgraded, retrained version of Nous Hermes 2, consisting of an updated and cleaned version of the OpenHermes 2.5 Dataset, as well as a newly introduced Function Calling and JSON Mode dataset developed in-house.

    by nousresearch8K context$0.025/M input tokens$0.08/M output tokens
  16. Noromaid 20BNoromaid 20B

    A collab between IkariDev and Undi. This merge is suitable for RP, ERP, and general knowledge. #merge #uncensored

    by neversleep8K context$1/M input tokens$1.75/M output tokens
  17. Goliath 120BGoliath 120B

    A large LLM created by combining two fine-tuned Llama 70B models into one 120B model. Combines Xwin and Euryale. Credits to - @chargoddard for developing the framework used to merge the model - mergekit. - @Undi95 for helping with the merge ratios. #merge

    by alpindale6K context$4/M input tokens$5.50/M output tokens
  18. ReMM SLERP 13BReMM SLERP 13B

    A recreation trial of the original MythoMax-L2-B13 but with updated models. #merge

    by undi954K context$0.45/M input tokens$0.65/M output tokens
  19. MythoMax 13BMythoMax 13B

    One of the highest performing and most popular fine-tunes of Llama 2 13B, with rich descriptions and roleplay. #merge

    by gryphe4K context$0.06/M input tokens$0.06/M output tokens