MAI-Voice-2 is a high-fidelity, expressive text-to-speech model from Microsoft, powered by Azure AI Speech. It synthesizes natural-sounding speech across 10+ languages with support for expressive SSML styles (cheerful, sad, excited, etc.) and speed control (0.5×–2×). Voice names follow the Azure locale format (e.g., en-US-Harper:MAI-Voice-2). Output is available in MP3 and PCM at 24 kHz.
Modalities
Price
$22/M characters
Weekly Tokens
3K
Released
Jun 2, 2026