Search/
Skip to content
/
OpenRouter
© 2026 OpenRouter, Inc

Product

  • Chat
  • Rankings
  • Apps
  • Models
  • Providers
  • Pricing
  • Enterprise
  • Labs

Company

  • About
  • Announcements
  • CareersHiring
  • Privacy
  • Terms of Service
  • Support
  • State of AI
  • Works With OR
  • Data

Developer

  • Documentation
  • API Reference
  • SDK
  • Status

Connect

  • Discord
  • GitHub
  • LinkedIn
  • X
  • YouTube
Collections/Speech-to-Text

Best Speech-to-Text and Transcription Models

Model rankings updated April 2026 based on real usage data.

Speech-to-text models convert spoken audio into text for transcription, captions, voice notes, meeting summaries, call analysis, and speech-driven applications. This collection helps you compare the best transcription models on OpenRouter by accuracy, speed, language support, and cost for your audio workflows.

Top Speech-to-Text Models on OpenRouter

Favicon for openai

OpenAI: GPT-4o Transcribe

237K tokens

GPT-4o Transcribe is OpenAI's high-quality speech-to-text model built on GPT-4o audio capabilities. It's priced per token (input and output), making it suitable for workflows that benefit from token-level billing transparency.

by openai128K context$2.50/M input tokens$10/M output tokens
Favicon for openai

OpenAI: Whisper 1

Whisper is OpenAI's open-source automatic speech recognition model, available via API as whisper-1. It supports transcription and translation across 50+ languages from audio files up to 25 MB. Accepts formats including mp3, mp4, wav, and webm. Priced per minute of audio duration, billed to the nearest second.

by openai$6,000/M input tokens$0/M output tokens