THUDM: GLM 4 9B (free)

thudm/glm-4-9b:free

Created Apr 25, 202532,000 context

$0/M input tokens$0/M output tokens

GLM-4-9B-0414 is a 9 billion parameter language model from the GLM-4 series developed by THUDM. Trained using the same reinforcement learning and alignment strategies as its larger 32B counterparts, GLM-4-9B-0414 achieves high performance relative to its size, making it suitable for resource-constrained deployments that still require robust language understanding and generation capabilities.

Providers for GLM 4 9B (free)

OpenRouter routes requests to the best providers that are able to handle your prompt size and parameters, with fallbacks to maximize uptime.

NovitaAI

Context

32K

Max Output

32K

Input

Output

Latency

1.01s

Throughput

42.17tps

Uptime

Throughput

Latency

Apps using GLM 4 9B (free)

Top public apps this week using this model

Cline

Autonomous coding agent right in your IDE

4.8Mtokens

Roo Code

A whole dev team of AI agents in your editor

1.35Mtokens

SillyTavern

LLM frontend for power users

925Ktokens

OpenRouter: Chatroom

Chat with multiple LLMs at once

385Ktokens

Open WebUI

Extensible, self-hosted AI interface

85Ktokens

Chub AI

GenAI for everyone

83Ktokens

Agnaistic

A "bring your own AI" chat service

56Ktokens

WooCommerce AI Description Plugin

new

41Ktokens

RisuAI

Browse characters, choose models, and chat

37Ktokens

10.

liteLLM

Open-source library to simplify LLM calls

15Ktokens

11.

New API

new

13Ktokens

12.

Cherry Studio

new

12Ktokens

13.

Apollo: Open Intelligence

new

7Ktokens

14.

Voices of the Court

new

6Ktokens

15.

PoeServer

new

6Ktokens

16.

Msty

new

4Ktokens

17.

Immersive Translation

new

4Ktokens

18.

Future Fiction Academy (Raptor Write)

new

3Ktokens

19.

Page Assist

new

2Ktokens

20.

Generador Letras AI

new

2Ktokens

Recent activity on GLM 4 9B (free)

Tokens processed per day

Uptime stats for GLM 4 9B (free)

Uptime stats for GLM 4 9B (free) on the only provider

When an error occurs in an upstream provider, we can recover by routing to another healthy provider, if your request filters allow it.

Learn more about our load balancing and customization options.

Sample code and API for GLM 4 9B (free)

OpenRouter normalizes requests and responses across providers for you.

OpenRouter provides an OpenAI-compatible completion API to 300+ models & providers that you can call directly, or using the OpenAI SDK. Additionally, some third-party SDKs are available.

In the examples below, the OpenRouter-specific headers are optional. Setting them allows your app to appear on the OpenRouter leaderboards.

from openai import OpenAI

client = OpenAI(
  base_url="https://openrouter.ai/api/v1",
  api_key="<OPENROUTER_API_KEY>",
)

completion = client.chat.completions.create(
  extra_headers={
    "HTTP-Referer": "<YOUR_SITE_URL>", # Optional. Site URL for rankings on openrouter.ai.
    "X-Title": "<YOUR_SITE_NAME>", # Optional. Site title for rankings on openrouter.ai.
  },
  extra_body={},
  model="thudm/glm-4-9b:free",
  messages=[
    {
      "role": "user",
      "content": "What is the meaning of life?"
    }
  ]
)
print(completion.choices[0].message.content)

Using third-party SDKs

For information about using third-party SDKs and frameworks with OpenRouter, please see our frameworks documentation.

See the Request docs for all possible fields, and Parameters for explanations of specific sampling parameters.

More models from thudm

GLM Z1 Rumination 32B

THUDM: GLM Z1 Rumination 32B is a 32B-parameter deep reasoning model from the GLM-4-Z1 series, optimized for complex, open-ended tasks requiring prolonged deliberation. It builds upon glm-4-32b-0414 with additional reinforcement learning phases and multi-stage alignment strategies, introducing “rumination” capabilities designed to emulate extended cognitive processing. This includes iterative reasoning, multi-hop analysis, and tool-augmented workflows such as search, retrieval, and citation-aware synthesis.

The model excels in research-style writing, comparative analysis, and intricate question answering. It supports function calling for search and navigation primitives (search, click, open, finish), enabling use in agent-style pipelines. Rumination behavior is governed by multi-turn loops with rule-based reward shaping and delayed decision mechanisms, benchmarked against Deep Research frameworks such as OpenAI’s internal alignment stacks. This variant is suitable for scenarios requiring depth over speed.

GLM Z1 9B

GLM-Z1-9B-0414 is a 9B-parameter language model developed by THUDM as part of the GLM-4 family. It incorporates techniques originally applied to larger GLM-Z1 models, including extended reinforcement learning, pairwise ranking alignment, and training on reasoning-intensive tasks such as mathematics, code, and logic. Despite its smaller size, it demonstrates strong performance on general-purpose reasoning tasks and outperforms many open-source models in its weight class.

GLM 4 9B

GLM Z1 32B

GLM-Z1-32B-0414 is an enhanced reasoning variant of GLM-4-32B, built for deep mathematical, logical, and code-oriented problem solving. It applies extended reinforcement learning—both task-specific and general pairwise preference-based—to improve performance on complex multi-step tasks. Compared to the base GLM-4-32B model, Z1 significantly boosts capabilities in structured reasoning and formal domains.

The model supports enforced “thinking” steps via prompt engineering and offers improved coherence for long-form outputs. It’s optimized for use in agentic workflows, and includes support for long context (via YaRN), JSON tool calling, and fine-grained sampling configuration for stable inference. Ideal for use cases requiring deliberate, multi-step reasoning or formal derivations.

GLM 4 32B

GLM-4-32B-0414 is a 32B bilingual (Chinese-English) open-weight language model optimized for code generation, function calling, and agent-style tasks. Pretrained on 15T of high-quality and reasoning-heavy data, it was further refined using human preference alignment, rejection sampling, and reinforcement learning. The model excels in complex reasoning, artifact generation, and structured output tasks, achieving performance comparable to GPT-4o and DeepSeek-V3-0324 across several benchmarks.

THUDM: GLM 4 9B (free)