OpenRouter Text-to-Speech - Complete Documentation

OpenRouter supports text-to-speech (TTS) via a dedicated /api/v1/audio/speech endpoint that is compatible with the OpenAI Audio Speech API. Send text and receive a raw audio byte stream in your chosen format.

Model Discovery

You can find TTS models in several ways:

Via the API

Use the output_modalities query parameter on the Models API to discover TTS models:

# List only TTS models
curl "https://openrouter.ai/api/v1/models?output_modalities=speech"

On the Models Page

Visit the Models page and filter by output modalities to find models capable of speech synthesis. Look for models that list "speech" in their output modalities.

API Usage

Send a POST request to /api/v1/audio/speech with the text you want to synthesize. The response is a raw audio byte stream — not JSON — so you can pipe it directly to a file or audio player.

Basic Example

Request Parameters

Parameter	Type	Required	Description
`model`	string	Yes	The TTS model to use (e.g., `openai/gpt-4o-mini-tts-2025-12-15`, `mistralai/voxtral-mini-tts-2603`)
`input`	string	Yes	The text to synthesize into speech
`voice`	string	Yes	Voice identifier. Available voices vary by model — check each model’s page on the Models page for supported voices
`response_format`	string	No	Audio output format: `mp3` or `pcm`. Defaults to `pcm`
`speed`	number	No	Playback speed multiplier. Only used by models that support it (e.g., OpenAI TTS). Ignored by other providers. Defaults to `1.0`
`provider`	object	No	Provider-specific passthrough configuration

Provider-Specific Options

You can pass provider-specific options using the provider parameter. Options are keyed by provider slug, and only the options for the matched provider are forwarded:

{
  "model": "openai/gpt-4o-mini-tts-2025-12-15",
  "input": "Hello world",
  "voice": "alloy",
  "provider": {
    "options": {
      "openai": {
        "instructions": "Speak in a warm, friendly tone."
      }
    }
  }
}

Azure (MAI-Voice-2)

Azure TTS uses SSML internally, but this is fully abstracted — you only need the standard parameters. The voice parameter takes an Azure voice name (e.g., en-US-Harper:MAI-Voice-2), and speed is supported (range: 0.5–2.0). For expressive synthesis, pass style and optionally styledegree via provider options:

{
  "model": "microsoft/mai-voice-2",
  "input": "Welcome to the event!",
  "voice": "en-US-Harper:MAI-Voice-2",
  "response_format": "mp3",
  "speed": 1.0,
  "provider": {
    "options": {
      "azure": {
        "style": "cheerful",
        "styledegree": 1.2
      }
    }
  }
}

Option	Type	Description
`style`	string	Expressive speaking style (e.g., `cheerful`, `sad`, `angry`, `excited`). Available styles depend on the voice.
`styledegree`	number	Intensity of the style effect. Default is `1.0`; higher values increase expressiveness.

Response Format

The TTS endpoint returns a raw audio byte stream, not JSON. The response includes the following headers:

Header	Description
`Content-Type`	The MIME type of the audio. `audio/mpeg` for `mp3` format, `audio/pcm` for `pcm` format
`X-Generation-Id`	The unique generation ID for the request, useful for tracking and debugging

Output Formats

Format	Content-Type	Description
`mp3`	`audio/mpeg`	Compressed audio, smaller file size. Good for storage and playback
`pcm`	`audio/pcm`	Uncompressed raw audio. Lower latency, suitable for real-time streaming pipelines

Pricing

TTS models are priced per character of input text. Pricing varies by model and provider. You can check the per-character cost for each model on the Models page or via the Models API.

OpenAI SDK Compatibility

The TTS endpoint is fully compatible with the OpenAI SDK. You can use the OpenAI client libraries by pointing them at OpenRouter’s base URL:

Best Practices

Choose the right format: Use mp3 for storage and general playback. Use pcm for real-time streaming pipelines where latency matters
Voice selection: Different providers offer different voices. Check the model’s documentation or experiment with available voices to find the best fit for your use case
Input length: For very long texts, consider splitting the input into smaller segments and concatenating the audio output. This can improve reliability and reduce latency for the first audio chunk
Speed parameter: The speed parameter is only supported by certain providers (e.g., OpenAI). It is silently ignored by providers that don’t support it

Troubleshooting

Empty or corrupted audio file?

Verify the response_format matches how you’re saving the file (e.g., don’t save pcm output with a .mp3 extension)
Check the response status code — non-200 responses return JSON error bodies, not audio

Model not found?

Use the Models page to find available TTS models
Verify the model slug is correct (e.g., openai/gpt-4o-mini-tts-2025-12-15, not gpt-4o-mini-tts)

Voice not available?

Available voices vary by provider. Check the provider’s documentation for supported voice identifiers
Each model has its own set of voices — check the model’s page on the Models page for the full list

​Model Discovery

​Via the API

​On the Models Page

​API Usage

​Basic Example

​Request Parameters

​Provider-Specific Options

​Azure (MAI-Voice-2)

​Response Format

​Output Formats

​Pricing

​OpenAI SDK Compatibility

​Best Practices

​Troubleshooting