Text-to-Speech

How to generate speech audio from text with OpenRouter models

OpenRouter supports text-to-speech (TTS) via a dedicated /api/v1/audio/speech endpoint that is compatible with the OpenAI Audio Speech API. Send text and receive a raw audio byte stream in your chosen format.

Model Discovery

You can find TTS models in several ways:

Via the API

Use the output_modalities query parameter on the Models API to discover TTS models:

$# List only TTS models
$curl "https://openrouter.ai/api/v1/models?output_modalities=speech"

On the Models Page

Visit the Models page and filter by output modalities to find models capable of speech synthesis. Look for models that list "speech" in their output modalities.

API Usage

Send a POST request to /api/v1/audio/speech with the text you want to synthesize. The response is a raw audio byte stream — not JSON — so you can pipe it directly to a file or audio player.

Basic Example

1import { OpenRouter } from '@openrouter/sdk';
2import fs from 'fs';
3
4const openRouter = new OpenRouter({
5 apiKey: '{{API_KEY_REF}}',
6});
7
8const stream = await openRouter.tts.createSpeech({
9 speechRequest: {
10 model: '{{MODEL}}',
11 input: 'Hello! This is a text-to-speech test.',
12 voice: 'alloy',
13 responseFormat: 'mp3',
14 },
15});
16
17// Collect the audio stream and save to a file
18const reader = stream.getReader();
19const chunks: Uint8Array[] = [];
20while (true) {
21 const { done, value } = await reader.read();
22 if (done) break;
23 chunks.push(value);
24}
25const totalLength = chunks.reduce((sum, c) => sum + c.length, 0);
26const buffer = new Uint8Array(totalLength);
27let offset = 0;
28for (const chunk of chunks) {
29 buffer.set(chunk, offset);
30 offset += chunk.length;
31}
32await fs.promises.writeFile('output.mp3', buffer);
33console.log('Audio saved to output.mp3');

Request Parameters

ParameterTypeRequiredDescription
modelstringYesThe TTS model to use (e.g., openai/gpt-4o-mini-tts-2025-12-15, mistralai/voxtral-mini-tts-2603)
inputstringYesThe text to synthesize into speech
voicestringYesVoice identifier. Available voices vary by model — check each model’s page on the Models page for supported voices
response_formatstringNoAudio output format: mp3 or pcm. Defaults to pcm
speednumberNoPlayback speed multiplier. Only used by models that support it (e.g., OpenAI TTS). Ignored by other providers. Defaults to 1.0
providerobjectNoProvider-specific passthrough configuration

Provider-Specific Options

You can pass provider-specific options using the provider parameter. Options are keyed by provider slug, and only the options for the matched provider are forwarded:

1{
2 "model": "openai/gpt-4o-mini-tts-2025-12-15",
3 "input": "Hello world",
4 "voice": "alloy",
5 "provider": {
6 "options": {
7 "openai": {
8 "instructions": "Speak in a warm, friendly tone."
9 }
10 }
11 }
12}

Response Format

The TTS endpoint returns a raw audio byte stream, not JSON. The response includes the following headers:

HeaderDescription
Content-TypeThe MIME type of the audio. audio/mpeg for mp3 format, audio/L16 for pcm format
X-Generation-IdThe unique generation ID for the request, useful for tracking and debugging

Output Formats

FormatContent-TypeDescription
mp3audio/mpegCompressed audio, smaller file size. Good for storage and playback
pcmaudio/L16Uncompressed raw audio. Lower latency, suitable for real-time streaming pipelines

Pricing

TTS models are priced per character of input text. Pricing varies by model and provider. You can check the per-character cost for each model on the Models page or via the Models API.

OpenAI SDK Compatibility

The TTS endpoint is fully compatible with the OpenAI SDK. You can use the OpenAI client libraries by pointing them at OpenRouter’s base URL:

1from openai import OpenAI
2
3client = OpenAI(
4 base_url="https://openrouter.ai/api/v1",
5 api_key="{{API_KEY_REF}}",
6)
7
8# Non-streaming: get the full audio response
9response = client.audio.speech.create(
10 model="openai/gpt-4o-mini-tts-2025-12-15",
11 input="The quick brown fox jumps over the lazy dog.",
12 voice="nova",
13 response_format="mp3"
14)
15response.write_to_file("output.mp3")
16
17# Streaming: process audio chunks as they arrive
18with client.audio.speech.with_streaming_response.create(
19 model="openai/gpt-4o-mini-tts-2025-12-15",
20 input="The quick brown fox jumps over the lazy dog.",
21 voice="nova",
22 response_format="mp3"
23) as response:
24 response.stream_to_file("output.mp3")

Best Practices

  • Choose the right format: Use mp3 for storage and general playback. Use pcm for real-time streaming pipelines where latency matters
  • Voice selection: Different providers offer different voices. Check the model’s documentation or experiment with available voices to find the best fit for your use case
  • Input length: For very long texts, consider splitting the input into smaller segments and concatenating the audio output. This can improve reliability and reduce latency for the first audio chunk
  • Speed parameter: The speed parameter is only supported by certain providers (e.g., OpenAI). It is silently ignored by providers that don’t support it

Troubleshooting

Empty or corrupted audio file?

  • Verify the response_format matches how you’re saving the file (e.g., don’t save pcm output with a .mp3 extension)
  • Check the response status code — non-200 responses return JSON error bodies, not audio

Model not found?

  • Use the Models page to find available TTS models
  • Verify the model slug is correct (e.g., openai/gpt-4o-mini-tts-2025-12-15, not gpt-4o-mini-tts)

Voice not available?

  • Available voices vary by provider. Check the provider’s documentation for supported voice identifiers
  • Each model has its own set of voices — check the model’s page on the Models page for the full list