All entries below are generated from the single source of truth YAML. Use this as the canonical list of model repositories/links for offline setup.
| Component | Source | Size | Auto-Download | Notes |
|---|---|---|---|---|
| F5TTS_Base | SWivid/F5-TTS | ~1.2GB | ✅ | English base model |
| F5TTS_v1_Base | SWivid/F5-TTS | ~1.2GB | ✅ | English v1 model |
| E2TTS_Base | SWivid/E2-TTS | ~1.2GB | ✅ | English E2-TTS model |
| F5-DE | aihpi/F5-TTS-German | ~1.2GB | ✅ | German finetune |
| F5-ES | jpgallegoar/F5-Spanish | ~1.2GB | ✅ | Spanish finetune |
| F5-FR | RASPIAUDIO/F5-French-MixedSpeakers-reduced | ~1.2GB | ✅ | French finetune |
| F5-JP | Jmica/F5TTS | ~1.2GB | ✅ | Japanese finetune |
| F5-Hindi-Small | SPRINGLab/F5-Hindi-24KHz | ~632MB | ✅ | Hindi finetune |
| Vocos Mel-24kHz | charactr/vocos-mel-24khz | N/A | ✅ | Optional vocoder |
| Component | Source | Size | Auto-Download | Notes |
|---|---|---|---|---|
| English | ResembleAI/chatterbox | ~2GB | ✅ | .pt model set |
| German | stlohrey/chatterbox_de | ~4.3GB | ✅ | .safetensors model set |
| German (havok2) | niobures/Chatterbox-TTS | ~4.3GB | ✅ | .safetensors model set |
| German (SebastianBodza) | niobures/Chatterbox-TTS | ~4.3GB | ✅ | .safetensors model set |
| Italian | niobures/Chatterbox-TTS | ~4.3GB | ✅ | .pt model set |
| French | Thomcles/ChatterBox-fr | ~4.3GB | ✅ | .safetensors model set |
| Russian | niobures/Chatterbox-TTS | ~4.3GB | ✅ | .safetensors model set |
| Armenian | niobures/Chatterbox-TTS | ~4.3GB | ✅ | .safetensors model set |
| Georgian | niobures/Chatterbox-TTS | ~4.3GB | ✅ | .safetensors model set |
| Japanese | niobures/Chatterbox-TTS | ~4.3GB | ✅ | .safetensors model set |
| Korean | niobures/Chatterbox-TTS | ~4.3GB | ✅ | .safetensors model set |
| Norwegian | akhbar/chatterbox-tts-norwegian | ~4.3GB | ✅ | .safetensors model set |
| Component | Source | Size | Auto-Download | Notes |
|---|---|---|---|---|
| Official 23-Lang (v1/v2) | ResembleAI/chatterbox | ~4.3GB | ✅ | v1 + v2 files and tokenizer |
| Russian stress dictionary (Russian only) | Vuizur/add-stress-to-epub release | ~1.5GB | ✅ | Auxiliary Official 23-Lang Russian stress-labeling data; downloads on demand only when Russian stress support is used |
| Vietnamese (Viterbox) | dolly-vn/viterbox | ~4.3GB | ✅ | Vietnamese community finetune used by downloader |
| Egyptian Arabic (oddadmix) | oddadmix/chatterbox-egyptian-v0 | ~4.3GB | ✅ | Egyptian Arabic community finetune (architecture v2) |
| Component | Source | Size | Auto-Download | Notes |
|---|---|---|---|---|
| vibevoice-1.5B | microsoft/VibeVoice-1.5B | ~5.4GB | ✅ | Microsoft official model |
| vibevoice-7B | aoi-ot/VibeVoice-Large | ~18GB | ✅ | Community mirror used by downloader |
| kugelaudio-0-open | kugelaudio/kugelaudio-0-open | ~18GB | ✅ | KugelAudio multilingual 7B variant |
| kugel-2 | kugelaudio/kugel-2 | ~18.7GB | ✅ | KugelAudio v2 merged 7B variant |
| Component | Source | Size | Auto-Download | Notes |
|---|---|---|---|---|
| higgs-audio-v2-3B | bosonai/higgs-audio-v2-generation-3B-base | ~9GB | ✅ | Generation model |
| Audio tokenizer | bosonai/higgs-audio-v2-tokenizer | ~200MB | ✅ | Tokenizer model |
| Component | Source | Size | Auto-Download | Notes |
|---|---|---|---|---|
| IndexTTS-2 | IndexTeam/IndexTTS-2 | Multiple files | ✅ | Main TTS engine |
| w2v-bert-2.0 | facebook/w2v-bert-2.0 | ~2GB | ✅ | Semantic feature extractor |
| qwen0.6bemo4-merge | Included with IndexTTS-2 | Included | ✅ | Text emotion model bundle |
| Component | Source | Size | Auto-Download | Notes |
|---|---|---|---|---|
| Fun-CosyVoice3-0.5B / 0.5B-RL | FunAudioLLM/Fun-CosyVoice3-0.5B-2512 | ~5.4GB first variant (+~2GB second) | ✅ | Both variants share common files |
| Component | Source | Size | Auto-Download | Notes |
|---|---|---|---|---|
| CustomVoice 0.6B | Qwen/Qwen3-TTS-12Hz-0.6B-CustomVoice | ~1.5GB | ✅ | Preset voices + instructions |
| CustomVoice 1.7B | Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice | ~4.2GB | ✅ | Preset voices + instructions |
| VoiceDesign 1.7B | Qwen/Qwen3-TTS-12Hz-1.7B-VoiceDesign | ~4.2GB | ✅ | Text-to-voice design model |
| Base 0.6B | Qwen/Qwen3-TTS-12Hz-0.6B-Base | ~1.5GB | ✅ | Zero-shot voice cloning |
| Base 1.7B | Qwen/Qwen3-TTS-12Hz-1.7B-Base | ~4.2GB | ✅ | Zero-shot voice cloning |
| Qwen3-ASR-1.7B | Qwen/Qwen3-ASR-1.7B | N/A | ✅ | ASR transcribe model |
| Qwen3-ForcedAligner-0.6B | Qwen/Qwen3-ForcedAligner-0.6B | N/A | ✅ | Word-level timestamps |
| Component | Source | Size | Auto-Download | Notes |
|---|---|---|---|---|
| granite-4.0-1b-speech | ibm-granite/granite-4.0-1b-speech | ~4.6GB | ✅ | Main Granite ASR / AST model |
| Qwen3-ForcedAligner-0.6B | Qwen/Qwen3-ForcedAligner-0.6B | N/A | ✅ | Optional custom word-level timestamps/SRT path; reused from Qwen folder |
| Component | Source | Size | Auto-Download | Notes |
|---|---|---|---|---|
| Step-Audio-EditX | stepfun-ai/Step-Audio-EditX | ~7GB | ✅ | Main 3B audio editing model |
| Step-Audio-Tokenizer | stepfun-ai/Step-Audio-Tokenizer | Included | ✅ | Tokenizer bundle used by Step EditX |
| Component | Source | Size | Auto-Download | Notes |
|---|---|---|---|---|
| echo-tts-base (model + PCA state) | jordand/echo-tts-base | ~5.3GB | ✅ | pytorch_model.safetensors + pca_state.safetensors |
| fish-s1-dac-min (audio codec) | jordand/fish-s1-dac-min | ~1.8GB | ✅ | pytorch_model.safetensors — audio codec required by Echo-TTS |
| Component | Source | Size | Auto-Download | Notes |
|---|---|---|---|---|
| RVC character pack | SayanoAI/RVC-Studio (RVC folder) | Varies | ✅ | Default auto-download characters: Claire, Sayano, Mae_v2, Fuji, Monika (extras also available) |
| RVC index pack (.index) | SayanoAI/RVC-Studio (.index folder) | Varies | ✅ | Optional FAISS indexes for improved voice similarity |
| content-vec-best.safetensors | lengyue233/content-vec-best | ~300MB | ✅ | Voice feature model |
| rmvpe.pt | lj1995/VoiceConversionWebUI | ~55MB | ✅ | Pitch extraction model |
| pretrained_v2 (f0 G/D pairs) | lj1995/VoiceConversionWebUI | ~300MB total | ✅ | Training init checkpoints for 32k/40k/48k RVC runs; downloaded on first training use |
Generated from tts_audio_suite_engines.yaml.