Skip to content

Latest commit

 

History

History
273 lines (205 loc) · 5.75 KB

File metadata and controls

273 lines (205 loc) · 5.75 KB

Model Folder Layouts

Use this document for folder structures and placement notes. For repository/source URLs, use MODEL_DOWNLOAD_SOURCES.md.

ChatterBox

Recommended:

ComfyUI/models/TTS/chatterbox/

Legacy (still supported):

ComfyUI/models/chatterbox/

Required base files:

  • conds.pt
  • s3gen.pt
  • t3_cfg.pt
  • tokenizer.json
  • ve.pt

Optional multilingual subfolders:

ComfyUI/models/TTS/chatterbox/
├── English/
├── German/
└── Norwegian/

ChatterBox Official 23-Lang

ComfyUI/models/TTS/chatterbox_official_23lang/
├── ChatterBox Official 23-Lang/
│   ├── t3_23lang.safetensors
│   ├── t3_mtl23ls_v2.safetensors
│   ├── grapheme_mtl_merged_expanded_v1.json
│   ├── s3gen.pt
│   ├── ve.pt
│   ├── mtl_tokenizer.json
│   └── conds.pt
└── russian_text_stresser/
    ├── russian_dict.db
    └── simple_cases.pkl

Notes:

  • v1 and v2 coexist in one directory.
  • Vietnamese (Viterbox) is available as a community finetune option.
  • russian_text_stresser/ is auxiliary Russian-only data for Official 23-Lang stress labeling and downloads on demand.

F5-TTS

Recommended:

ComfyUI/models/TTS/F5-TTS/

Legacy (still supported):

ComfyUI/models/F5-TTS/

Typical structure:

ComfyUI/models/TTS/F5-TTS/
├── F5TTS_Base/
│   ├── model_1200000.safetensors
│   └── vocab.txt
├── F5TTS_v1_Base/
│   ├── model_1250000.safetensors
│   └── vocab.txt
└── vocos/
    ├── config.yaml
    └── pytorch_model.bin

Notes:

  • F5-TTS models auto-download on first use.
  • vocab.txt is required per model.
  • Vocos is optional and also auto-downloadable.

F5-TTS Voice References

ComfyUI/models/voices/
├── character1.wav
├── character1.reference.txt
├── narrator.wav
└── narrator.txt

Requirements:

  • WAV, clean speech, 5-30s (24kHz recommended).
  • Text must match spoken audio.
  • Naming: name.wav + name.reference.txt (preferred) or name.txt.

Inference guidelines:

  1. Keep reference audio under ~12s with a little trailing silence.
  2. Use uppercase letter-by-letter only when desired.
  3. Use spaces/punctuation for explicit pauses.
  4. Keep a space after sentence-ending punctuation.
  5. For Chinese output, preprocess numbers if needed.

Higgs Audio 2

ComfyUI/models/TTS/HiggsAudio/
└── higgs-audio-v2-3B/
    ├── generation/
    ├── tokenizer/
    └── voices/

Notes:

  • Both generation model and tokenizer auto-download.
  • Place reference audio/transcription in voices/.

VibeVoice

ComfyUI/models/TTS/VibeVoice/
├── vibevoice-1.5B/
└── vibevoice-7B/

Notes:

  • Both variants auto-download on first use.
  • vibevoice-7B uses a community mirror in downloader config.

RVC

Recommended:

ComfyUI/models/TTS/RVC/
├── *.pth
├── content-vec-best.safetensors
├── rmvpe.pt
├── hubert/
├── pretrained_v2/
│   ├── f0G32k.pth
│   ├── f0D32k.pth
│   ├── f0G40k.pth
│   ├── f0D40k.pth
│   ├── f0G48k.pth
│   └── f0D48k.pth
└── .index/

Legacy (still supported):

ComfyUI/models/RVC/

Notes:

  • Base models auto-download.
  • Character .pth models can be auto-downloaded defaults or user-provided.
  • pretrained_v2/ is used by the integrated RVC trainer and auto-downloads on first training run.
  • Training datasets, logs, progress snapshots, and resumable checkpoints are stored under ComfyUI/output/tts_audio_suite_training/rvc/, not inside the custom node repo.
  • UVR models are downloaded under ComfyUI/models/TTS/UVR/ (or legacy ComfyUI/models/UVR/).

IndexTTS-2

ComfyUI/models/TTS/IndexTTS/
├── IndexTTS-2/
└── w2v-bert-2.0/

Notes:

  • Emotion components and semantic feature models auto-download.

Step Audio EditX

ComfyUI/models/TTS/step_audio_editx/
├── Step-Audio-EditX/
│   └── CosyVoice-300M-25Hz/
└── FunASR-Paraformer/

Notes:

  • Main model, tokenizer assets, and speech stack auto-download.

CosyVoice3

ComfyUI/models/TTS/CosyVoice/
└── Fun-CosyVoice3-0.5B/
    ├── llm.pt
    ├── llm.rl.pt
    └── shared model files...

Notes:

  • First selected variant downloads first.
  • Shared files are reused across variants.

Qwen3-TTS and Qwen3-ASR

ComfyUI/models/TTS/qwen3_tts/
├── Qwen3-TTS-12Hz-0.6B-CustomVoice/
├── Qwen3-TTS-12Hz-1.7B-CustomVoice/
├── Qwen3-TTS-12Hz-1.7B-VoiceDesign/
├── Qwen3-TTS-12Hz-0.6B-Base/
├── Qwen3-TTS-12Hz-1.7B-Base/
├── qwen2-audio-encoder/
└── asr/
    ├── Qwen3-ASR-1.7B/
    └── Qwen3-ForcedAligner-0.6B/

Notes:

  • Only selected variants download.
  • Shared tokenizer assets are reused.

Granite ASR

ComfyUI/models/TTS/granite_asr/
└── granite-4.0-1b-speech/
    ├── config.json
    ├── chat_template.jinja
    ├── model-00001-of-00003.safetensors
    ├── model-00002-of-00003.safetensors
    ├── model-00003-of-00003.safetensors
    └── tokenizer / processor files...

Notes:

  • Granite downloads into its own granite_asr folder.
  • If Granite word timestamps are enabled, it lazily reuses Qwen3-ForcedAligner-0.6B from the Qwen ASR folder instead of duplicating that model.

Echo-TTS

ComfyUI/models/TTS/
├── echo-tts-base/
│   ├── pytorch_model.safetensors
│   └── pca_state.safetensors
└── fish-s1-dac-min/
    └── pytorch_model.safetensors

Notes:

  • Both components are required and auto-downloaded on first use.
  • License: CC-BY-NC-SA (non-commercial).