Skip to content
  1.  
  2. © 2023 – 2025 OpenRouter, Inc
    Favicon for Cerebras

    Cerebras

    Browse models provided by Cerebras (Terms of Service)

    6 models

    Tokens processed on OpenRouter

    • Z.AI: GLM 4.6GLM 4.6

      Compared with GLM-4.5, this generation brings several key improvements: Longer context window: The context window has been expanded from 128K to 200K tokens, enabling the model to handle more complex agentic tasks. Superior coding performance: The model achieves higher scores on code benchmarks and demonstrates better real-world performance in applications such as Claude Code、Cline、Roo Code and Kilo Code, including improvements in generating visually polished front-end pages. Advanced reasoning: GLM-4.6 shows a clear improvement in reasoning performance and supports tool use during inference, leading to stronger overall capability. More capable agents: GLM-4.6 exhibits stronger performance in tool using and search-based agents, and integrates more effectively within agent frameworks. Refined writing: Better aligns with human preferences in style and readability, and performs more naturally in role-playing scenarios.

      by z-ai
    200K context
    $2.25/M input tokens$2.75/M output tokens
  3. OpenAI: gpt-oss-120bgpt-oss-120b

    gpt-oss-120b is an open-weight, 117B-parameter Mixture-of-Experts (MoE) language model from OpenAI designed for high-reasoning, agentic, and general-purpose production use cases. It activates 5.1B parameters per forward pass and is optimized to run on a single H100 GPU with native MXFP4 quantization. The model supports configurable reasoning depth, full chain-of-thought access, and native tool use, including function calling, browsing, and structured output generation.

    by openai131K context$0.35/M input tokens$0.75/M output tokens
  4. Qwen: Qwen3 235B A22B Instruct 2507Qwen3 235B A22B Instruct 2507

    Qwen3-235B-A22B-Instruct-2507 is a multilingual, instruction-tuned mixture-of-experts language model based on the Qwen3-235B architecture, with 22B active parameters per forward pass. It is optimized for general-purpose text generation, including instruction following, logical reasoning, math, code, and tool usage. The model supports a native 262K context length and does not implement "thinking mode" (<think> blocks). Compared to its base variant, this version delivers significant gains in knowledge coverage, long-context reasoning, coding benchmarks, and alignment with open-ended tasks. It is particularly strong on multilingual understanding, math reasoning (e.g., AIME, HMMT), and alignment evaluations like Arena-Hard and WritingBench.

    by qwen262K context$0.60/M input tokens$1.20/M output tokens
  5. Qwen: Qwen3 32BQwen3 32B

    Qwen3-32B is a dense 32.8B parameter causal language model from the Qwen3 series, optimized for both complex reasoning and efficient dialogue. It supports seamless switching between a "thinking" mode for tasks like math, coding, and logical inference, and a "non-thinking" mode for faster, general-purpose conversation. The model demonstrates strong performance in instruction-following, agent tool use, creative writing, and multilingual tasks across 100+ languages and dialects. It natively handles 32K token contexts and can extend to 131K tokens using YaRN-based scaling.

    by qwen131K context$0.40/M input tokens$0.80/M output tokens
  6. Meta: Llama 3.3 70B InstructLlama 3.3 70B Instruct

    The Meta Llama 3.3 multilingual large language model (LLM) is a pretrained and instruction tuned generative model in 70B (text in/text out). The Llama 3.3 instruction tuned text only model is optimized for multilingual dialogue use cases and outperforms many of the available open source and closed chat models on common industry benchmarks. Supported languages: English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai. Model Card

    by meta-llama131K context$0.85/M input tokens$1.20/M output tokens
  7. Meta: Llama 3.1 8B InstructLlama 3.1 8B Instruct

    Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This 8B instruct-tuned version is fast and efficient. It has demonstrated strong performance compared to leading closed-source models in human evaluations. To read more about the model release, click here. Usage of this model is subject to Meta's Acceptable Use Policy.

    by meta-llama131K context$0.10/M input tokens$0.10/M output tokens