| Language | Code | F5-TTS | ChatterBox | ChatterBox 23L | VibeVoice | Higgs Audio 2 | IndexTTS-2 | CosyVoice3 | Qwen3-TTS | Granite ASR | Step Audio EditX | Echo-TTS | RVC |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 🇺🇸 English | EN | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| 🇨🇳 Chinese | ZH | ❌ | ❌ | ✅ | ✅ | ✅ (Mandarin) | ✅ | ✅ + 18 dialects | ✅ | ❌ | ✅ (Mandarin + Sichuanese, Cantonese) | ❌ | ✅ |
| 🇩🇪 German | DE | ✅ | ✅ (×3) | ✅ | ✅ (Kugel) | ✅ | ❌ | ❌ | ✅ | ✅ | ❌ | ❌ | ✅ |
| 🇪🇸 Spanish | ES | ✅ | ❌ | ✅ | ✅ (Kugel) | ✅ | ❌ | ❌ | ✅ | ✅ | ❌ | ❌ | ✅ |
| 🇫🇷 French | FR | ✅ | ✅ | ✅ | ✅ (Kugel) | ❌ | ❌ | ❌ | ✅ | ✅ | ❌ | ❌ | ✅ |
| 🇮🇹 Italian | IT | ✅ | ✅ | ✅ | ✅ (Kugel) | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ | ✅ |
| 🇯🇵 Japanese | JA | ✅ | ✅ | ✅ | ✅ (Kugel) | ❌ | ✅ ? | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ |
| 🇰🇷 Korean | KO | ❌ | ✅ | ✅ | ✅ (Kugel) | ✅ | ❌ | ✅ | ✅ | ❌ | ✅ | ❌ | ✅ |
| 🇷🇺 Russian | RU | ❌ | ✅ | ✅ | ✅ (Kugel) | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ | ✅ |
| 🇧🇷 Portuguese | PT | ✅ (BR) | ❌ | ✅ | ✅ (Kugel) | ❌ | ❌ | ❌ | ✅ (EU/BR*) | ✅ | ❌ | ❌ | ✅ |
| 🇵🇱 Polish | PL | ✅ | ❌ | ✅ | ✅ (Kugel) | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ |
| 🇮🇳 Hindi | HI | ✅ | ❌ | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ |
| 🇪🇬 Arabic | AR | ❌ | ❌ | ✅ (Egyptian) | ✅ (Kugel) | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ |
| 🇹🇷 Turkish | TR | ❌ | ❌ | ✅ | ✅ (Kugel) | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ |
| 🇹🇭 Thai | TH | ✅ | ❌ | ✅ | ✅ (Kugel) | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ |
| 🇳🇴 Norwegian | NO | ❌ | ✅ | ✅ | ✅ (Kugel) | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ |
| 🇻🇳 Vietnamese | VI | ❌ | ❌ | ✅ (Viterbox) | ✅ (Kugel) | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ |
| 🇦🇲 Armenian | HY | ❌ | ✅ | ❌ | ✅ (Kugel) | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ |
| 🇬🇪 Georgian | KA | ❌ | ✅ | ❌ | ✅ (Kugel) | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ |
| 🇩🇰 Danish | DA | ❌ | ❌ | ✅ | ✅ (Kugel) | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ |
| 🇫🇮 Finnish | FI | ❌ | ❌ | ✅ | ✅ (Kugel) | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ |
| 🇬🇷 Greek | EL | ❌ | ❌ | ✅ | ✅ (Kugel) | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ |
| 🇮🇱 Hebrew | HE | ❌ | ❌ | ✅ | ✅ (Kugel) | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ |
| 🇲🇾 Malay | MS | ❌ | ❌ | ✅ | ✅ (Kugel) | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ |
| 🇳🇱 Dutch | NL | ❌ | ❌ | ✅ | ✅ (Kugel) | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ |
| 🇸🇪 Swedish | SV | ❌ | ❌ | ✅ | ✅ (Kugel) | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ |
| 🇰🇪 Swahili | SW | ❌ | ❌ | ✅ | ✅ (Kugel) | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ |
Notes:
- CosyVoice3 Chinese: Includes 18+ dialects (Cantonese, Sichuan, Dongbei, Shanghai, etc.)
- Higgs Audio 2: Trained on EN, ZH (Mandarin), KO, DE, ES (English majority) - 10M hours AudioVerse dataset
- IndexTTS-2: Trained on 55K+ hours - ZH, EN, JA primary.
- RVC: Language-independent (work as post-processors VC with any TTS output)
- Qwen3-TTS: Supports pt-BR only with instruction (voice design or harcoded voices). Base can't do instructions, so it will always ouput pt-PT.