-
-
Notifications
You must be signed in to change notification settings - Fork 101
Expand file tree
/
Copy pathrequirements.txt
More file actions
113 lines (95 loc) · 5.02 KB
/
requirements.txt
File metadata and controls
113 lines (95 loc) · 5.02 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
# TTS Audio Suite - Universal TTS for ComfyUI
# Comprehensive multi-engine TTS with Python 3.13 compatibility
# --- INSTALLATION METHOD ---
# This custom node uses install.py for intelligent dependency management.
# ComfyUI Manager automatically runs install.py which handles:
# - Python 3.13 compatibility issues (MediaPipe → OpenSeeFace fallback)
# - NumPy version conflicts (constraints to avoid Numba issues)
# - Package dependency conflicts (selective --no-deps installation)
# - All bundled engines: ChatterBox, F5-TTS, Higgs Audio
# - Optional features: RVC voice conversion, mouth movement analysis
# --- CORE SAFE PACKAGES ---
# These packages rarely cause conflicts and install normally
# Foundation ML packages
torch>=2.0.0
torchaudio>=2.0.0
numpy>=1.26.4,<2.3.0 # Compatible with both numpy 1.26.4 and 2.x series
setuptools>=65.0.0 # Provides distutils compatibility for Python 3.12+ (required by FunASR bundled code)
# Audio processing (safe)
soundfile>=0.12.0
sounddevice>=0.4.0
# Text processing (safe)
jieba
pypinyin
unidecode
phonemizer # IPA phonemization for multilingual TTS (requires espeak system dependency)
omegaconf>=2.3.0
transformers>=4.51.3,<=4.57.3 # Required for VibeVoice compatibility (4.51.3+). Transformers 5.0.0 breaks Qwen3-TTS tokenizer loading.
# ML utilities (safe)
accelerate
datasets
requests
dacite
bitsandbytes>=0.47.0 # 4-bit quantization support for VibeVoice memory efficiency
# Bundled engine dependencies (safe)
conformer>=0.3.2 # ChatterBox engine
x-transformers
torchdiffeq # F5-TTS differential equations
wandb # F5-TTS logging
ema-pytorch # F5-TTS exponential moving average
vocos # F5-TTS vocoder
# Echo-TTS engine (CUDA recommended)
echo-tts
# Audio restoration
# VoiceFixer is bundled in utils/voicefixer_bundled/
# Uses librosa (already required) for STFT/ISTFT instead of torchlibrosa - reduces dependencies
# IndexTTS-2 engine dependencies (safe)
cn2an>=0.5.22 # Chinese number to Arabic number conversion
g2p-en>=2.1.0 # English grapheme-to-phoneme conversion
keras>=2.9.0 # Deep learning framework
modelscope>=1.27.0 # Chinese model hub for IndexTTS-2
munch>=4.0.0 # Dictionary access with dot notation
json5>=0.12.0 # JSON5 parsing for IndexTTS-2 config files
ninja>=1.11.0 # Build tool for CUDA kernel compilation (BigVGAN optimization)
sentencepiece>=0.2.1 # Text tokenization
textstat>=0.7.10 # Text statistics and readability
punctuators # ONNX punctuation/truecase post-processing for ASR text
# Step Audio EditX engine dependencies (safe)
openai-whisper # Mel spectrogram extraction for audio tokenizer
funasr>=1.1.3 # FunASR speech processing toolkit
nagisa>=0.2.11 # Japanese tokenizer required by Qwen3-ASR forced aligner
hyperpyyaml # YAML configuration parser
protobuf>=3.20.0 # Protocol buffers (compatible with descript-audiotools)
# onnxruntime installed by install.py with --no-deps to avoid conflicts
# RVC voice conversion (safe)
monotonic-alignment-search
faiss-cpu>=1.7.4
praat-parselmouth>=0.4.6 # Praat-based f0 extraction for RVC (pm method)
pyworld>=0.3.5 # World vocoder for RVC harvest/dio methods
torchfcpe>=0.0.4 # Fast Context-based Pitch Estimation for RVC (fcpe method)
# Optional performance enhancements
# sageattention # Optional: GPU-optimized mixed-precision attention for VibeVoice (requires CUDA SM80+)
# --- PROBLEMATIC PACKAGES ---
# These are installed by install.py with special handling (NOT here in requirements.txt):
# - librosa (--no-deps): Forces numpy downgrade
# - descript-audio-codec (--no-deps): Conflicts with protobuf
# - cached-path (--no-deps): Forces package downgrades
# - torchcrepe (--no-deps): Conflicts via librosa dependency
# - onnxruntime (--no-deps): Forces numpy 2.3.x, needed for OpenSeeFace
# - opencv-python (--no-deps): Forces numpy downgrade via numpy<2.3.0 constraint
# - gradio (--no-deps): Forces pydantic, pillow, pydantic-core downgrades
# --- PYTHON 3.13 NOTES ---
# [OK] All TTS engines work (ChatterBox, F5-TTS, Higgs Audio, CosyVoice3)
# [OK] RVC voice conversion works
# [OK] OpenSeeFace mouth movement (experimental alternative)
# [NO] MediaPipe incompatible (binary compatibility issue)
# CosyVoice3 engine dependencies (bundled in engines/cosyvoice/impl/)
# Most dependencies handled by install.py (diffusers, hydra-core, matplotlib, rich, uvicorn, wetext, onnxruntime)
inflect>=7.3.0 # Text normalization for English (used in frontend.py)
# --- BUNDLED ENGINES ---
# All engines are bundled to avoid external dependency conflicts:
# - ChatterBox: engines/chatterbox/ (modified for ComfyUI)
# - F5-TTS: engines/f5_tts/ (numpy 2.x compatible fork)
# - Higgs Audio: engines/higgs_audio/ (transformers 4.46+ compatible)
# - IndexTTS-2: engines/index_tts/ (emotion disentanglement TTS)
# - CosyVoice3: engines/cosyvoice/impl/ (multilingual zero-shot voice cloning)