Skip to content

Latest commit

 

History

History
119 lines (93 loc) · 9.19 KB

File metadata and controls

119 lines (93 loc) · 9.19 KB

Model Download Sources

All entries below are generated from the single source of truth YAML. Use this as the canonical list of model repositories/links for offline setup.

F5-TTS

Component Source Size Auto-Download Notes
F5TTS_Base SWivid/F5-TTS ~1.2GB English base model
F5TTS_v1_Base SWivid/F5-TTS ~1.2GB English v1 model
E2TTS_Base SWivid/E2-TTS ~1.2GB English E2-TTS model
F5-DE aihpi/F5-TTS-German ~1.2GB German finetune
F5-ES jpgallegoar/F5-Spanish ~1.2GB Spanish finetune
F5-FR RASPIAUDIO/F5-French-MixedSpeakers-reduced ~1.2GB French finetune
F5-JP Jmica/F5TTS ~1.2GB Japanese finetune
F5-Hindi-Small SPRINGLab/F5-Hindi-24KHz ~632MB Hindi finetune
Vocos Mel-24kHz charactr/vocos-mel-24khz N/A Optional vocoder

ChatterBox

Component Source Size Auto-Download Notes
English ResembleAI/chatterbox ~2GB .pt model set
German stlohrey/chatterbox_de ~4.3GB .safetensors model set
German (havok2) niobures/Chatterbox-TTS ~4.3GB .safetensors model set
German (SebastianBodza) niobures/Chatterbox-TTS ~4.3GB .safetensors model set
Italian niobures/Chatterbox-TTS ~4.3GB .pt model set
French Thomcles/ChatterBox-fr ~4.3GB .safetensors model set
Russian niobures/Chatterbox-TTS ~4.3GB .safetensors model set
Armenian niobures/Chatterbox-TTS ~4.3GB .safetensors model set
Georgian niobures/Chatterbox-TTS ~4.3GB .safetensors model set
Japanese niobures/Chatterbox-TTS ~4.3GB .safetensors model set
Korean niobures/Chatterbox-TTS ~4.3GB .safetensors model set
Norwegian akhbar/chatterbox-tts-norwegian ~4.3GB .safetensors model set

ChatterBox 23L

Component Source Size Auto-Download Notes
Official 23-Lang (v1/v2) ResembleAI/chatterbox ~4.3GB v1 + v2 files and tokenizer
Russian stress dictionary (Russian only) Vuizur/add-stress-to-epub release ~1.5GB Auxiliary Official 23-Lang Russian stress-labeling data; downloads on demand only when Russian stress support is used
Vietnamese (Viterbox) dolly-vn/viterbox ~4.3GB Vietnamese community finetune used by downloader
Egyptian Arabic (oddadmix) oddadmix/chatterbox-egyptian-v0 ~4.3GB Egyptian Arabic community finetune (architecture v2)

VibeVoice

Component Source Size Auto-Download Notes
vibevoice-1.5B microsoft/VibeVoice-1.5B ~5.4GB Microsoft official model
vibevoice-7B aoi-ot/VibeVoice-Large ~18GB Community mirror used by downloader
kugelaudio-0-open kugelaudio/kugelaudio-0-open ~18GB KugelAudio multilingual 7B variant
kugel-2 kugelaudio/kugel-2 ~18.7GB KugelAudio v2 merged 7B variant

Higgs Audio 2

Component Source Size Auto-Download Notes
higgs-audio-v2-3B bosonai/higgs-audio-v2-generation-3B-base ~9GB Generation model
Audio tokenizer bosonai/higgs-audio-v2-tokenizer ~200MB Tokenizer model

IndexTTS-2

Component Source Size Auto-Download Notes
IndexTTS-2 IndexTeam/IndexTTS-2 Multiple files Main TTS engine
w2v-bert-2.0 facebook/w2v-bert-2.0 ~2GB Semantic feature extractor
qwen0.6bemo4-merge Included with IndexTTS-2 Included Text emotion model bundle

CosyVoice3

Component Source Size Auto-Download Notes
Fun-CosyVoice3-0.5B / 0.5B-RL FunAudioLLM/Fun-CosyVoice3-0.5B-2512 ~5.4GB first variant (+~2GB second) Both variants share common files

Qwen3-TTS

Component Source Size Auto-Download Notes
CustomVoice 0.6B Qwen/Qwen3-TTS-12Hz-0.6B-CustomVoice ~1.5GB Preset voices + instructions
CustomVoice 1.7B Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice ~4.2GB Preset voices + instructions
VoiceDesign 1.7B Qwen/Qwen3-TTS-12Hz-1.7B-VoiceDesign ~4.2GB Text-to-voice design model
Base 0.6B Qwen/Qwen3-TTS-12Hz-0.6B-Base ~1.5GB Zero-shot voice cloning
Base 1.7B Qwen/Qwen3-TTS-12Hz-1.7B-Base ~4.2GB Zero-shot voice cloning
Qwen3-ASR-1.7B Qwen/Qwen3-ASR-1.7B N/A ASR transcribe model
Qwen3-ForcedAligner-0.6B Qwen/Qwen3-ForcedAligner-0.6B N/A Word-level timestamps

Granite ASR

Component Source Size Auto-Download Notes
granite-4.0-1b-speech ibm-granite/granite-4.0-1b-speech ~4.6GB Main Granite ASR / AST model
Qwen3-ForcedAligner-0.6B Qwen/Qwen3-ForcedAligner-0.6B N/A Optional custom word-level timestamps/SRT path; reused from Qwen folder

Step Audio EditX

Component Source Size Auto-Download Notes
Step-Audio-EditX stepfun-ai/Step-Audio-EditX ~7GB Main 3B audio editing model
Step-Audio-Tokenizer stepfun-ai/Step-Audio-Tokenizer Included Tokenizer bundle used by Step EditX

Echo-TTS

Component Source Size Auto-Download Notes
echo-tts-base (model + PCA state) jordand/echo-tts-base ~5.3GB pytorch_model.safetensors + pca_state.safetensors
fish-s1-dac-min (audio codec) jordand/fish-s1-dac-min ~1.8GB pytorch_model.safetensors — audio codec required by Echo-TTS

RVC

Component Source Size Auto-Download Notes
RVC character pack SayanoAI/RVC-Studio (RVC folder) Varies Default auto-download characters: Claire, Sayano, Mae_v2, Fuji, Monika (extras also available)
RVC index pack (.index) SayanoAI/RVC-Studio (.index folder) Varies Optional FAISS indexes for improved voice similarity
content-vec-best.safetensors lengyue233/content-vec-best ~300MB Voice feature model
rmvpe.pt lj1995/VoiceConversionWebUI ~55MB Pitch extraction model
pretrained_v2 (f0 G/D pairs) lj1995/VoiceConversionWebUI ~300MB total Training init checkpoints for 32k/40k/48k RVC runs; downloaded on first training use

Generated from tts_audio_suite_engines.yaml.