API Streaming | Real-time Model Responses in OpenRouter | OpenRouter

The OpenRouter API allows streaming responses from any model. This is useful for building chat interfaces or other applications where the UI should update as the model generates the response.

To enable streaming, you can set the stream parameter to true in your request. The model will then stream the response to the client in chunks, rather than returning the entire response at once.

Here is an example of how to stream a response, and process it:

1 import requests
2 import json
3 
4 question = "How would you build the tallest building ever?"
5 
6 url = "https://openrouter.ai/api/v1/chat/completions"
7 headers = {
8   "Authorization": f"Bearer {{API_KEY_REF}}",
9   "Content-Type": "application/json"
10 }
11 
12 payload = {
13   "model": "{{MODEL}}",
14   "messages": [{"role": "user", "content": question}],
15   "stream": True
16 }
17 
18 buffer = ""
19 with requests.post(url, headers=headers, json=payload, stream=True) as r:
20   for chunk in r.iter_content(chunk_size=1024, decode_unicode=True):
21     buffer += chunk
22     while True:
23       try:
24         # Find the next complete SSE line
25         line_end = buffer.find('\n')
26         if line_end == -1:
27           break
28 
29         line = buffer[:line_end].strip()
30         buffer = buffer[line_end + 1:]
31 
32         if line.startswith('data: '):
33           data = line[6:]
34           if data == '[DONE]':
35             break
36 
37           try:
38             data_obj = json.loads(data)
39             content = data_obj["choices"][0]["delta"].get("content")
40             if content:
41               print(content, end="", flush=True)
42           except json.JSONDecodeError:
43             pass
44       except Exception:
45         break

Additional Information

For SSE (Server-Sent Events) streams, OpenRouter occasionally sends comments to prevent connection timeouts. These comments look like:

: OPENROUTER PROCESSING

Comment payload can be safely ignored per the SSE specs. However, you can leverage it to improve UX as needed, e.g. by showing a dynamic loading indicator.

Some SSE client implementations might not parse the payload according to spec, which leads to an uncaught error when you JSON.stringify the non-JSON payloads. We recommend the following clients:

Stream Cancellation

Streaming requests can be cancelled by aborting the connection. For supported providers, this immediately stops model processing and billing.

Provider Support

Supported

OpenAI, Azure, Anthropic
Fireworks, Mancer, Recursal
AnyScale, Lepton, OctoAI
Novita, DeepInfra, Together
Cohere, Hyperbolic, Infermatic
Avian, XAI, Cloudflare
SFCompute, Nineteen, Liquid
Friendli, Chutes, DeepSeek

Not Currently Supported

AWS Bedrock, Groq, Modal
Google, Google AI Studio, Minimax
HuggingFace, Replicate, Perplexity
Mistral, AI21, Featherless
Lynn, Lambda, Reflection
SambaNova, Inflection, ZeroOneAI
AionLabs, Alibaba, Nebius
Kluster, Targon, InferenceNet

To implement stream cancellation:

1 import requests
2 from threading import Event, Thread
3 
4 def stream_with_cancellation(prompt: str, cancel_event: Event):
5     with requests.Session() as session:
6         response = session.post(
7             "https://openrouter.ai/api/v1/chat/completions",
8             headers={"Authorization": f"Bearer {{API_KEY_REF}}"},
9             json={"model": "{{MODEL}}", "messages": [{"role": "user", "content": prompt}], "stream": True},
10             stream=True
11         )
12 
13         try:
14             for line in response.iter_lines():
15                 if cancel_event.is_set():
16                     response.close()
17                     return
18                 if line:
19                     print(line.decode(), end="", flush=True)
20         finally:
21             response.close()
22 
23 # Example usage:
24 cancel_event = Event()
25 stream_thread = Thread(target=lambda: stream_with_cancellation("Write a story", cancel_event))
26 stream_thread.start()
27 
28 # To cancel the stream:
29 cancel_event.set()

Cancellation only works for streaming requests with supported providers. For non-streaming requests or unsupported providers, the model will continue processing and you will be billed for the complete response.