Search/
Skip to content
/
OpenRouter
© 2026 OpenRouter, Inc

Product

  • Chat
  • Rankings
  • Apps
  • Models
  • Providers
  • Pricing
  • Enterprise
  • Labs

Company

  • About
  • Announcements
  • CareersHiring
  • Privacy
  • Terms of Service
  • Support
  • State of AI
  • Works With OR
  • Data

Developer

  • Documentation
  • API Reference
  • SDK
  • Status

Connect

  • Discord
  • GitHub
  • LinkedIn
  • X
  • YouTube
Collections/Audio Models

Best Audio Generation Models

Model rankings updated April 2026 based on real usage data.

Audio generation models create audio output from text or other prompts, powering use cases like music generation, sound design, voice-enabled assistants, and multimodal applications that respond with audio. This collection highlights some of the best audio generation models available on OpenRouter, making it easier to compare quality, pricing, and latency across providers through a single API.

Top Audio Generation Models on OpenRouter

Favicon for openai

OpenAI: GPT Audio Mini

125M tokens

A cost-efficient version of GPT Audio. The new snapshot features an upgraded decoder for more natural sounding voices and maintains better voice consistency. Input is priced at $0.60 per million tokens and output is priced at $2.40 per million tokens.

by openai128K context$0.60/M input tokens$2.40/M output tokens$0.60/M audio tokens
Favicon for openai

OpenAI: GPT-4o Audio

51.3M tokens

The gpt-4o-audio-preview model adds support for audio inputs as prompts. This enhancement allows the model to detect nuances within audio recordings and add depth to generated user experiences. Audio outputs are currently not supported. Audio tokens are priced at $40 per million input and $80 per million output audio tokens.

by openai128K context$2.50/M input tokens$10/M output tokens$40/M audio tokens
Favicon for google

Google: Lyria 3 Pro Preview

17.2M tokens

Full-length songs are priced at $0.08 per song. Lyria 3 is Google's family of music generation models, available through the Gemini API. With Lyria 3, you can generate high-quality, 48kHz stereo audio from text prompts or from images. These models deliver structural coherence, including vocals, timed lyrics, and full instrumental arrangements. Lyria 3 Pro can generate full-length songs with verses, choruses, bridges.

by google1.05M context$0/M input tokens$0/M output tokens
Favicon for openai

OpenAI: GPT Audio

10.5M tokens

The gpt-audio model is OpenAI's first generally available audio model. The new snapshot features an upgraded decoder for more natural sounding voices and maintains better voice consistency. Audio is priced at $32 per million input tokens and $64 per million output tokens.

by openai128K context$2.50/M input tokens$10/M output tokens$32/M audio tokens
Favicon for google

Google: Lyria 3 Clip Preview

3.54M tokens

30 second duration clips are priced at $0.04 per clip. Lyria 3 is Google's family of music generation models, available through the Gemini API. With Lyria 3, you can generate high-quality, 48kHz stereo audio from text prompts or from images. These models deliver structural coherence, including vocals, timed lyrics, and full instrumental arrangements. Lyria 3 Clip can generate short clips, loops, previews.

by google1.05M context$0/M input tokens$0/M output tokens