***

title: Streaming
headline: API Streaming | Real-time Model Responses in OpenRouter
canonical-url: '[https://openrouter.ai/docs/api/reference/streaming](https://openrouter.ai/docs/api/reference/streaming)'
'og:site\_name': OpenRouter Documentation
'og:title': API Streaming - Real-time Model Response Integration
'og:description': >-
Learn how to implement streaming responses with OpenRouter's API. Complete
guide to Server-Sent Events (SSE) and real-time model outputs.
'og:image':
type: url
value: >-
[https://openrouter.ai/dynamic-og?title=API%20Streaming\&description=Real-time%20model%20response%20streaming](https://openrouter.ai/dynamic-og?title=API%20Streaming\&description=Real-time%20model%20response%20streaming)
'og:image:width': 1200
'og:image:height': 630
'twitter:card': summary\_large\_image
'twitter:site': '@OpenRouter'
noindex: false
nofollow: false
---------------

The OpenRouter API allows streaming responses from *any model*. This is useful for building chat interfaces or other applications where the UI should update as the model generates the response.

To enable streaming, you can set the `stream` parameter to `true` in your request. The model will then stream the response to the client in chunks, rather than returning the entire response at once.

Here is an example of how to stream a response, and process it:

<Template
  data={{
  API_KEY_REF,
  MODEL: Model.GPT_4_Omni
}}
>
  <CodeGroup>
    ```typescript title="TypeScript SDK"
    import { OpenRouter } from '@openrouter/sdk';

    const openRouter = new OpenRouter({
      apiKey: '{{API_KEY_REF}}',
    });

    const question = 'How would you build the tallest building ever?';

    const stream = await openRouter.chat.send({
      model: '{{MODEL}}',
      messages: [{ role: 'user', content: question }],
      stream: true,
    });

    for await (const chunk of stream) {
      const content = chunk.choices?.[0]?.delta?.content;
      if (content) {
        console.log(content);
      }

      // Final chunk includes usage stats
      if (chunk.usage) {
        console.log('Usage:', chunk.usage);
      }
    }
    ```

    ```python Python
    import requests
    import json

    question = "How would you build the tallest building ever?"

    url = "https://openrouter.ai/api/v1/chat/completions"
    headers = {
      "Authorization": f"Bearer {{API_KEY_REF}}",
      "Content-Type": "application/json"
    }

    payload = {
      "model": "{{MODEL}}",
      "messages": [{"role": "user", "content": question}],
      "stream": True
    }

    buffer = ""
    with requests.post(url, headers=headers, json=payload, stream=True) as r:
      for chunk in r.iter_content(chunk_size=1024, decode_unicode=True):
        buffer += chunk
        while True:
          try:
            # Find the next complete SSE line
            line_end = buffer.find('\n')
            if line_end == -1:
              break

            line = buffer[:line_end].strip()
            buffer = buffer[line_end + 1:]

            if line.startswith('data: '):
              data = line[6:]
              if data == '[DONE]':
                break

              try:
                data_obj = json.loads(data)
                content = data_obj["choices"][0]["delta"].get("content")
                if content:
                  print(content, end="", flush=True)
              except json.JSONDecodeError:
                pass
          except Exception:
            break
    ```

    ```typescript title="TypeScript (fetch)"
    const question = 'How would you build the tallest building ever?';
    const response = await fetch('https://openrouter.ai/api/v1/chat/completions', {
      method: 'POST',
      headers: {
        Authorization: `Bearer ${API_KEY_REF}`,
        'Content-Type': 'application/json',
      },
      body: JSON.stringify({
        model: '{{MODEL}}',
        messages: [{ role: 'user', content: question }],
        stream: true,
      }),
    });

    const reader = response.body?.getReader();
    if (!reader) {
      throw new Error('Response body is not readable');
    }

    const decoder = new TextDecoder();
    let buffer = '';

    try {
      while (true) {
        const { done, value } = await reader.read();
        if (done) break;

        // Append new chunk to buffer
        buffer += decoder.decode(value, { stream: true });

        // Process complete lines from buffer
        while (true) {
          const lineEnd = buffer.indexOf('\n');
          if (lineEnd === -1) break;

          const line = buffer.slice(0, lineEnd).trim();
          buffer = buffer.slice(lineEnd + 1);

          if (line.startsWith('data: ')) {
            const data = line.slice(6);
            if (data === '[DONE]') break;

            try {
              const parsed = JSON.parse(data);
              const content = parsed.choices[0].delta.content;
              if (content) {
                console.log(content);
              }
            } catch (e) {
              // Ignore invalid JSON
            }
          }
        }
      }
    } finally {
      reader.cancel();
    }
    ```
  </CodeGroup>
</Template>

### Additional Information

For SSE (Server-Sent Events) streams, OpenRouter occasionally sends comments to prevent connection timeouts. These comments look like:

```text
: OPENROUTER PROCESSING
```

Comment payload can be safely ignored per the [SSE specs](https://html.spec.whatwg.org/multipage/server-sent-events.html#event-stream-interpretation). However, you can leverage it to improve UX as needed, e.g. by showing a dynamic loading indicator.

The generation ID is returned in the `X-Generation-Id` response header for all endpoints (chat completions, completions, responses, and messages), which can be useful for debugging and correlating requests.

Some SSE client implementations might not parse the payload according to spec, which leads to an uncaught error when you `JSON.stringify` the non-JSON payloads. We recommend the following clients:

* [eventsource-parser](https://github.com/rexxars/eventsource-parser)
* [OpenAI SDK](https://www.npmjs.com/package/openai)
* [Vercel AI SDK](https://www.npmjs.com/package/ai)

### Stream Cancellation

Streaming requests can be cancelled by aborting the connection. For supported providers, this immediately stops model processing and billing.

<Accordion title="Provider Support">
  **Supported**

  * OpenAI, Azure, Anthropic
  * Fireworks, Mancer, Recursal
  * AnyScale, Lepton, OctoAI
  * Novita, DeepInfra, Together
  * Cohere, Hyperbolic, Infermatic
  * Avian, XAI, Cloudflare
  * SFCompute, Nineteen, Liquid
  * Friendli, Chutes, DeepSeek

  **Not Currently Supported**

  * AWS Bedrock, Groq, Modal
  * Google, Google AI Studio, Minimax
  * HuggingFace, Replicate, Perplexity
  * Mistral, AI21, Featherless
  * Lynn, Lambda, Reflection
  * SambaNova, Inflection, ZeroOneAI
  * AionLabs, Alibaba, Nebius
  * Kluster, Targon, InferenceNet
</Accordion>

To implement stream cancellation:

<Template
  data={{
  API_KEY_REF,
  MODEL: Model.GPT_4_Omni
}}
>
  <CodeGroup>
    ```typescript title="TypeScript SDK"
    import { OpenRouter } from '@openrouter/sdk';

    const openRouter = new OpenRouter({
      apiKey: '{{API_KEY_REF}}',
    });

    const controller = new AbortController();

    try {
      const stream = await openRouter.chat.send({
        model: '{{MODEL}}',
        messages: [{ role: 'user', content: 'Write a story' }],
        stream: true,
      }, {
        signal: controller.signal,
      });

      for await (const chunk of stream) {
        const content = chunk.choices?.[0]?.delta?.content;
        if (content) {
          console.log(content);
        }
      }
    } catch (error) {
      if (error.name === 'AbortError') {
        console.log('Stream cancelled');
      } else {
        throw error;
      }
    }

    // To cancel the stream:
    controller.abort();
    ```

    ```python Python
    import requests
    from threading import Event, Thread

    def stream_with_cancellation(prompt: str, cancel_event: Event):
        with requests.Session() as session:
            response = session.post(
                "https://openrouter.ai/api/v1/chat/completions",
                headers={"Authorization": f"Bearer {{API_KEY_REF}}"},
                json={"model": "{{MODEL}}", "messages": [{"role": "user", "content": prompt}], "stream": True},
                stream=True
            )

            try:
                for line in response.iter_lines():
                    if cancel_event.is_set():
                        response.close()
                        return
                    if line:
                        print(line.decode(), end="", flush=True)
            finally:
                response.close()

    # Example usage:
    cancel_event = Event()
    stream_thread = Thread(target=lambda: stream_with_cancellation("Write a story", cancel_event))
    stream_thread.start()

    # To cancel the stream:
    cancel_event.set()
    ```

    ```typescript title="TypeScript (fetch)"
    const controller = new AbortController();

    try {
      const response = await fetch(
        'https://openrouter.ai/api/v1/chat/completions',
        {
          method: 'POST',
          headers: {
            Authorization: `Bearer ${{{API_KEY_REF}}}`,
            'Content-Type': 'application/json',
          },
          body: JSON.stringify({
            model: '{{MODEL}}',
            messages: [{ role: 'user', content: 'Write a story' }],
            stream: true,
          }),
          signal: controller.signal,
        },
      );

      // Process the stream...
    } catch (error) {
      if (error.name === 'AbortError') {
        console.log('Stream cancelled');
      } else {
        throw error;
      }
    }

    // To cancel the stream:
    controller.abort();
    ```
  </CodeGroup>
</Template>

<Warning>
  Cancellation only works for streaming requests with supported providers. For
  non-streaming requests or unsupported providers, the model will continue
  processing and you will be billed for the complete response.
</Warning>

### Handling Errors During Streaming

OpenRouter handles errors differently depending on when they occur during the streaming process:

#### Errors Before Any Tokens Are Sent

If an error occurs before any tokens have been streamed to the client, OpenRouter returns a standard JSON error response with the appropriate HTTP status code. This follows the standard error format:

```json
{
  "error": {
    "code": 400,
    "message": "Invalid model specified"
  }
}
```

Common HTTP status codes include:

* **400**: Bad Request (invalid parameters)
* **401**: Unauthorized (invalid API key)
* **402**: Payment Required (insufficient credits)
* **429**: Too Many Requests (rate limited)
* **502**: Bad Gateway (provider error)
* **503**: Service Unavailable (no available providers)

#### Errors After Tokens Have Been Sent (Mid-Stream)

If an error occurs after some tokens have already been streamed to the client, OpenRouter cannot change the HTTP status code (which is already 200 OK). Instead, the error is sent as a Server-Sent Event (SSE) with a unified structure:

```text
data: {"id":"cmpl-abc123","object":"chat.completion.chunk","created":1234567890,"model":"gpt-3.5-turbo","provider":"openai","error":{"code":"server_error","message":"Provider disconnected unexpectedly"},"choices":[{"index":0,"delta":{"content":""},"finish_reason":"error"}]}
```

Key characteristics of mid-stream errors:

* The error appears at the **top level** alongside standard response fields (id, object, created, etc.)
* A `choices` array is included with `finish_reason: "error"` to properly terminate the stream
* The HTTP status remains 200 OK since headers were already sent
* The stream is terminated after this unified error event

#### Code Examples

Here's how to properly handle both types of errors in your streaming implementation:

<Template
  data={{
  API_KEY_REF,
  MODEL: Model.GPT_4_Omni
}}
>
  <CodeGroup>
    ```typescript title="TypeScript SDK"
    import { OpenRouter } from '@openrouter/sdk';

    const openRouter = new OpenRouter({
      apiKey: '{{API_KEY_REF}}',
    });

    async function streamWithErrorHandling(prompt: string) {
      try {
        const stream = await openRouter.chat.send({
          model: '{{MODEL}}',
          messages: [{ role: 'user', content: prompt }],
          stream: true,
        });

        for await (const chunk of stream) {
          // Check for errors in chunk
          if ('error' in chunk) {
            console.error(`Stream error: ${chunk.error.message}`);
            if (chunk.choices?.[0]?.finish_reason === 'error') {
              console.log('Stream terminated due to error');
            }
            return;
          }

          // Process normal content
          const content = chunk.choices?.[0]?.delta?.content;
          if (content) {
            console.log(content);
          }
        }
      } catch (error) {
        // Handle pre-stream errors
        console.error(`Error: ${error.message}`);
      }
    }
    ```

    ```python Python
    import requests
    import json

    async def stream_with_error_handling(prompt):
        response = requests.post(
            'https://openrouter.ai/api/v1/chat/completions',
            headers={'Authorization': f'Bearer {{API_KEY_REF}}'},
            json={
                'model': '{{MODEL}}',
                'messages': [{'role': 'user', 'content': prompt}],
                'stream': True
            },
            stream=True
        )

        # Check initial HTTP status for pre-stream errors
        if response.status_code != 200:
            error_data = response.json()
            print(f"Error: {error_data['error']['message']}")
            return

        # Process stream and handle mid-stream errors
        for line in response.iter_lines():
            if line:
                line_text = line.decode('utf-8')
                if line_text.startswith('data: '):
                    data = line_text[6:]
                    if data == '[DONE]':
                        break

                    try:
                        parsed = json.loads(data)

                        # Check for mid-stream error
                        if 'error' in parsed:
                            print(f"Stream error: {parsed['error']['message']}")
                            # Check finish_reason if needed
                            if parsed.get('choices', [{}])[0].get('finish_reason') == 'error':
                                print("Stream terminated due to error")
                            break

                        # Process normal content
                        content = parsed['choices'][0]['delta'].get('content')
                        if content:
                            print(content, end='', flush=True)

                    except json.JSONDecodeError:
                        pass
    ```

    ```typescript title="TypeScript (fetch)"
    async function streamWithErrorHandling(prompt: string) {
      const response = await fetch(
        'https://openrouter.ai/api/v1/chat/completions',
        {
          method: 'POST',
          headers: {
            'Authorization': `Bearer ${{{API_KEY_REF}}}`,
            'Content-Type': 'application/json',
          },
          body: JSON.stringify({
            model: '{{MODEL}}',
            messages: [{ role: 'user', content: prompt }],
            stream: true,
          }),
        }
      );

      // Check initial HTTP status for pre-stream errors
      if (!response.ok) {
        const error = await response.json();
        console.error(`Error: ${error.error.message}`);
        return;
      }

      const reader = response.body?.getReader();
      if (!reader) throw new Error('No response body');

      const decoder = new TextDecoder();
      let buffer = '';

      try {
        while (true) {
          const { done, value } = await reader.read();
          if (done) break;

          buffer += decoder.decode(value, { stream: true });

          while (true) {
            const lineEnd = buffer.indexOf('\n');
            if (lineEnd === -1) break;

            const line = buffer.slice(0, lineEnd).trim();
            buffer = buffer.slice(lineEnd + 1);

            if (line.startsWith('data: ')) {
              const data = line.slice(6);
              if (data === '[DONE]') return;

              try {
                const parsed = JSON.parse(data);

                // Check for mid-stream error
                if (parsed.error) {
                  console.error(`Stream error: ${parsed.error.message}`);
                  // Check finish_reason if needed
                  if (parsed.choices?.[0]?.finish_reason === 'error') {
                    console.log('Stream terminated due to error');
                  }
                  return;
                }

                // Process normal content
                const content = parsed.choices[0].delta.content;
                if (content) {
                  console.log(content);
                }
              } catch (e) {
                // Ignore parsing errors
              }
            }
          }
        }
      } finally {
        reader.cancel();
      }
    }
    ```
  </CodeGroup>
</Template>

#### API-Specific Behavior

Different API endpoints may handle streaming errors slightly differently:

* **OpenAI Chat Completions API**: Returns `ErrorResponse` directly if no chunks were processed, or includes error information in the response if some chunks were processed
* **OpenAI Responses API**: May transform certain error codes (like `context_length_exceeded`) into a successful response with `finish_reason: "length"` instead of treating them as errors