Use this document for folder structures and placement notes. For repository/source URLs, use MODEL_DOWNLOAD_SOURCES.md.
Recommended:
ComfyUI/models/TTS/chatterbox/
Legacy (still supported):
ComfyUI/models/chatterbox/
Required base files:
conds.pts3gen.ptt3_cfg.pttokenizer.jsonve.pt
Optional multilingual subfolders:
ComfyUI/models/TTS/chatterbox/
├── English/
├── German/
└── Norwegian/
ComfyUI/models/TTS/chatterbox_official_23lang/
├── ChatterBox Official 23-Lang/
│ ├── t3_23lang.safetensors
│ ├── t3_mtl23ls_v2.safetensors
│ ├── grapheme_mtl_merged_expanded_v1.json
│ ├── s3gen.pt
│ ├── ve.pt
│ ├── mtl_tokenizer.json
│ └── conds.pt
└── russian_text_stresser/
├── russian_dict.db
└── simple_cases.pkl
Notes:
- v1 and v2 coexist in one directory.
- Vietnamese (Viterbox) is available as a community finetune option.
russian_text_stresser/is auxiliary Russian-only data for Official 23-Lang stress labeling and downloads on demand.
Recommended:
ComfyUI/models/TTS/F5-TTS/
Legacy (still supported):
ComfyUI/models/F5-TTS/
Typical structure:
ComfyUI/models/TTS/F5-TTS/
├── F5TTS_Base/
│ ├── model_1200000.safetensors
│ └── vocab.txt
├── F5TTS_v1_Base/
│ ├── model_1250000.safetensors
│ └── vocab.txt
└── vocos/
├── config.yaml
└── pytorch_model.bin
Notes:
- F5-TTS models auto-download on first use.
vocab.txtis required per model.- Vocos is optional and also auto-downloadable.
ComfyUI/models/voices/
├── character1.wav
├── character1.reference.txt
├── narrator.wav
└── narrator.txt
Requirements:
- WAV, clean speech, 5-30s (24kHz recommended).
- Text must match spoken audio.
- Naming:
name.wav+name.reference.txt(preferred) orname.txt.
Inference guidelines:
- Keep reference audio under ~12s with a little trailing silence.
- Use uppercase letter-by-letter only when desired.
- Use spaces/punctuation for explicit pauses.
- Keep a space after sentence-ending punctuation.
- For Chinese output, preprocess numbers if needed.
ComfyUI/models/TTS/HiggsAudio/
└── higgs-audio-v2-3B/
├── generation/
├── tokenizer/
└── voices/
Notes:
- Both generation model and tokenizer auto-download.
- Place reference audio/transcription in
voices/.
ComfyUI/models/TTS/VibeVoice/
├── vibevoice-1.5B/
└── vibevoice-7B/
Notes:
- Both variants auto-download on first use.
vibevoice-7Buses a community mirror in downloader config.
Recommended:
ComfyUI/models/TTS/RVC/
├── *.pth
├── content-vec-best.safetensors
├── rmvpe.pt
├── hubert/
├── pretrained_v2/
│ ├── f0G32k.pth
│ ├── f0D32k.pth
│ ├── f0G40k.pth
│ ├── f0D40k.pth
│ ├── f0G48k.pth
│ └── f0D48k.pth
└── .index/
Legacy (still supported):
ComfyUI/models/RVC/
Notes:
- Base models auto-download.
- Character
.pthmodels can be auto-downloaded defaults or user-provided. pretrained_v2/is used by the integrated RVC trainer and auto-downloads on first training run.- Training datasets, logs, progress snapshots, and resumable checkpoints are stored under
ComfyUI/output/tts_audio_suite_training/rvc/, not inside the custom node repo. - UVR models are downloaded under
ComfyUI/models/TTS/UVR/(or legacyComfyUI/models/UVR/).
ComfyUI/models/TTS/IndexTTS/
├── IndexTTS-2/
└── w2v-bert-2.0/
Notes:
- Emotion components and semantic feature models auto-download.
ComfyUI/models/TTS/step_audio_editx/
├── Step-Audio-EditX/
│ └── CosyVoice-300M-25Hz/
└── FunASR-Paraformer/
Notes:
- Main model, tokenizer assets, and speech stack auto-download.
ComfyUI/models/TTS/CosyVoice/
└── Fun-CosyVoice3-0.5B/
├── llm.pt
├── llm.rl.pt
└── shared model files...
Notes:
- First selected variant downloads first.
- Shared files are reused across variants.
ComfyUI/models/TTS/qwen3_tts/
├── Qwen3-TTS-12Hz-0.6B-CustomVoice/
├── Qwen3-TTS-12Hz-1.7B-CustomVoice/
├── Qwen3-TTS-12Hz-1.7B-VoiceDesign/
├── Qwen3-TTS-12Hz-0.6B-Base/
├── Qwen3-TTS-12Hz-1.7B-Base/
├── qwen2-audio-encoder/
└── asr/
├── Qwen3-ASR-1.7B/
└── Qwen3-ForcedAligner-0.6B/
Notes:
- Only selected variants download.
- Shared tokenizer assets are reused.
ComfyUI/models/TTS/granite_asr/
└── granite-4.0-1b-speech/
├── config.json
├── chat_template.jinja
├── model-00001-of-00003.safetensors
├── model-00002-of-00003.safetensors
├── model-00003-of-00003.safetensors
└── tokenizer / processor files...
Notes:
- Granite downloads into its own
granite_asrfolder. - If Granite word timestamps are enabled, it lazily reuses
Qwen3-ForcedAligner-0.6Bfrom the Qwen ASR folder instead of duplicating that model.
ComfyUI/models/TTS/
├── echo-tts-base/
│ ├── pytorch_model.safetensors
│ └── pca_state.safetensors
└── fish-s1-dac-min/
└── pytorch_model.safetensors
Notes:
- Both components are required and auto-downloaded on first use.
- License: CC-BY-NC-SA (non-commercial).