Friendli

Browse models provided by Friendli (Terms of Service)

6 models

Tokens processed on OpenRouter

Z.ai: GLM 5.2GLM 5.2
GLM 5.2 is a large-scale reasoning model from Z.ai. It supports text input and output with a 1M-token context window, and is suited for long-horizon agent workflows, project-level software engineering, and complex multi-step automation. Reasoning efforts high and xhigh are supported; xhigh maps to max reasoning. It is particularly strong at coding and tool use across long-running tasks, able to maintain engineering context and follow standards consistently through a full development workflow, from requirements to multi-platform deployment, in a single task.
by z-aiJun 16, 20261.05M context$1.40/M input tokens$4.40/M output tokens

Friendli

Browse models provided by Friendli (Terms of Service)

6 models

Tokens processed on OpenRouter

Z.ai: GLM 5.2GLM 5.2
GLM 5.2 is a large-scale reasoning model from Z.ai. It supports text input and output with a 1M-token context window, and is suited for long-horizon agent workflows, project-level software engineering, and complex multi-step automation. Reasoning efforts high and xhigh are supported; xhigh maps to max reasoning. It is particularly strong at coding and tool use across long-running tasks, able to maintain engineering context and follow standards consistently through a full development workflow, from requirements to multi-platform deployment, in a single task.
by z-aiJun 16, 20261.05M context$1.40/M input tokens$4.40/M output tokens

Z.ai: GLM 5.1GLM 5.1

GLM-5.1 delivers a major leap in coding capability, with particularly significant gains in handling long-horizon tasks. Unlike previous models built around minute-level interactions, GLM-5.1 can work independently and continuously on a single task for more than 8 hours, autonomously planning, executing, and improving itself throughout the process, ultimately delivering complete, engineering-grade results.

by z-aiApr 7, 2026203K context$1.40/M input tokens$4.40/M output tokens

Google: Gemma 4 31BGemma 4 31B

Gemma 4 31B Instruct is Google DeepMind's 30.7B dense multimodal model supporting text and image input with text output. Features a 256K token context window, configurable thinking/reasoning mode, native function calling, and multilingual support across 140+ languages. Strong on coding, reasoning, and document understanding tasks. Apache 2.0 license.

by googleApr 2, 2026262K context$0.14/M input tokens$0.40/M output tokens

MiniMax: MiniMax M2.5MiniMax M2.5

MiniMax-M2.5 is a SOTA large language model designed for real-world productivity. Trained in a diverse range of complex real-world digital working environments, M2.5 builds upon the coding expertise of M2.1 to extend into general office work, reaching fluency in generating and operating Word, Excel, and Powerpoint files, context switching between diverse software environments, and working across different agent and human teams. Scoring 80.2% on SWE-Bench Verified, 51.3% on Multi-SWE-Bench, and 76.3% on BrowseComp, M2.5 is also more token efficient than previous generations, having been trained to optimize its actions and output through planning.

by minimaxFeb 12, 2026205K context$0.30/M input tokens$1.20/M output tokens

DeepSeek: DeepSeek V3.2DeepSeek V3.2

DeepSeek-V3.2 is a large language model designed to harmonize high computational efficiency with strong reasoning and agentic tool-use performance. It introduces DeepSeek Sparse Attention (DSA), a fine-grained sparse attention mechanism that reduces training and inference cost while preserving quality in long-context scenarios. A scalable reinforcement learning post-training framework further improves reasoning, with reported performance in the GPT-5 class, and the model has demonstrated gold-medal results on the 2025 IMO and IOI. V3.2 also uses a large-scale agentic task synthesis pipeline to better integrate reasoning into tool-use settings, boosting compliance and generalization in interactive environments. Users can control the reasoning behaviour with the reasoning enabled boolean. Learn more in our docs

by deepseekDec 1, 2025131K context$0.50/M input tokens$1.50/M output tokens

Qwen: Qwen3 235B A22B Instruct 2507Qwen3 235B A22B Instruct 2507

Going away August 5, 2026

Qwen3-235B-A22B-Instruct-2507 is a multilingual, instruction-tuned mixture-of-experts language model based on the Qwen3-235B architecture, with 22B active parameters per forward pass. It is optimized for general-purpose text generation, including instruction following, logical reasoning, math, code, and tool usage. The model supports a native 262K context length and does not implement "thinking mode" (<think> blocks). Compared to its base variant, this version delivers significant gains in knowledge coverage, long-context reasoning, coding benchmarks, and alignment with open-ended tasks. It is particularly strong on multilingual understanding, math reasoning (e.g., AIME, HMMT), and alignment evaluations like Arena-Hard and WritingBench.

by qwenJul 21, 2025262K context$0.20/M input tokens$0.80/M output tokens