Releases: diodiogod/TTS-Audio-Suite
v4.25.1 - Integrated RVC Model Training
TTS Audio Suite v4.25.1
This release adds integrated RVC model training to TTS Audio Suite.
You can now prepare datasets, train RVC models, monitor progress in a live dashboard, and load the resulting models back into the normal RVC voice conversion workflow without leaving the suite.
New
- Integrated RVC model training inside TTS Audio Suite
- New
📦 RVC Dataset Prepnode - New
🎛️ RVC Training Confignode - New unified
🎓 Model Trainingnode - New live training dashboard with progress, ETA, speed, and recent loss graph
- Support for resume from real checkpoints
- Support for continue_from using prior training artifacts or an existing
🎭 Load RVC Character Model - Safer interrupt handling with resumable checkpoint saving
- Improved RVC model loading and index auto-detection
- New RVC training workflow example
- Updated README, model layout docs, and engine capability tables
Notes
resumecontinues the same training job from saved checkpointscontinue_fromstarts a new run from an already trained model or prior training artifacts- RVC is the first engine with integrated training support in the suite
- The training architecture is unified so future engines can plug into the same workflow later
Included workflow
RVC 🎓 Model Training
Restart ComfyUI after updating.
v4.22.0 - Echo-TTS Engine
🎧 Echo-TTS Engine
New engine contribution by @drphero!
✨ New Features
- Echo-TTS - DiT-based voice cloning engine (English only)
- Full integration with character switching, pause tags, and SRT timing
- Auto-downloads models on first use (~7.1GB total)
- License: CC-BY-NC-SA (non-commercial use only)
🙏 Credits
Full credit to @drphero for the original Echo-TTS integration.
TTS Audio Suite now includes 11 engines: ChatterBox, ChatterBox 23-Lang, F5-TTS, Higgs Audio 2, VibeVoice, IndexTTS-2, CosyVoice3, Qwen3-TTS, Echo-TTS, Step Audio EditX, and RVC.
v4.19.0 - Qwen3-TTS Engine with Voice Designer
🎨 Qwen3-TTS Engine - Create Voices from Text!
Major new engine addition! Qwen3-TTS brings a unique Voice Designer feature that lets you create custom voices from natural language descriptions. Plus three distinct model types for different use cases!
✨ New Features
Qwen3-TTS Engine
- 🎨 Voice Designer - Create custom voices from text descriptions! "A calm female voice with British accent" → instant voice generation
- Three model types with different capabilities:
- CustomVoice: 9 high-quality preset speakers (Vivian, Serena, Dylan, Eric, Ryan, etc.)
- VoiceDesign: Text-to-voice creation - describe your ideal voice and generate it
- Base: Zero-shot voice cloning from audio samples
- 10 language support - Chinese, English, Japanese, Korean, German, French, Russian, Portuguese, Spanish, Italian
- Model sizes: 0.6B (low VRAM) and 1.7B (high quality) variants
- Character voice switching with
[CharacterName]syntax - automatic preset mapping - SRT subtitle timing support with all timing modes (stretch_to_fit, pad_with_silence, etc.)
- Inline edit tags - Apply Step Audio EditX post-processing (emotions, styles, paralinguistic effects)
- Sage attention support - Improved VRAM efficiency with sageattention backend
- Smart caching - Prevents duplicate voice generation, skips model loading for existing voices
- Per-segment parameters - Control
[seed:42],[temperature:0.8]inline - Auto-download system - All 6 model variants downloaded automatically when needed
🎙️ Voice Designer Node
The standout feature of this release! Create voices without audio samples:
- Natural language input - Describe voice characteristics in plain English
- Disk caching - Saved voices load instantly without regeneration
- Standard format - Works seamlessly with Character Voices system
- Unified output - Compatible with all TTS nodes via NARRATOR_VOICE format
Example descriptions:
- "A calm female voice with British accent"
- "Deep male voice, authoritative and professional"
- "Young cheerful woman, slightly high-pitched"
📚 Documentation
- YAML-driven engine tables - Auto-generated comparison tables
- Condensed engine overview in README
- Portuguese accent guidance - Clear documentation of model limitations and workarounds
🎯 Technical Highlights
- Official Qwen3-TTS implementation bundled for stability
- 24kHz mono audio output
- Progress bars with real-time token generation tracking
- VRAM management with automatic model reload and device checking
- Full unified architecture integration
- Interrupt handling for cancellation support
Qwen3-TTS brings a total of 10 TTS engines to the suite, each with unique capabilities. Voice Designer is a first-of-its-kind feature in ComfyUI TTS extensions!
v4.16.0 - CosyVoice3 TTS Engine
🎙️ CosyVoice3 TTS Engine - Zero-Shot Voice Conversion!
Major new engine addition! CosyVoice3 brings powerful TTS capabilities AND zero-shot voice conversion. Previously, ChatterBox was the only zero-shot voice changer option (RVC requires training). Now you have another high-quality option with CosyVoice3 VC!
✨ New Features
CosyVoice3 Engine
- Zero-shot voice conversion (VC) - Convert any voice to match another voice without training! Another option alongside ChatterBox VC
- Iterative refinement cache - Improve VC quality through multiple passes
- Multilingual TTS - 4 core languages (Chinese, English, Japanese, Korean) plus 5 additional languages
- Three TTS modes: zero-shot, instruction, and cross-lingual voice cloning
- Model variants: standard and RL-enhanced (improved quality, set as default)
- Paralinguistic tag support for natural speech effects (laughter, breath, cough, etc.)
- Character voice switching with
[CharacterName]syntax - Language switching with
[lang:code]syntax - SRT subtitle timing support for synchronized audio generation
- Per-segment parameter control (
[seed:42],[speed:1.5]) - Live generation progress with token-by-token updates and ETA
🔧 Improvements
- Fix ChatterBox and RVC model discovery to work with custom model paths (extra_model_paths.yaml)
- Fix RVC audio chunking errors with short segments
📚 Documentation
- New CosyVoice3 paralinguistic tags guide
- Updated README with CosyVoice3 features and examples
🙏 Credits
Initial CosyVoice3 implementation by @tazztone
v4.15.0 - Step Audio EditX Engine & Universal Inline Edit Tags
TTS Audio Suite v4.15.0
🎉 Major New Features
⚙️ Step Audio EditX TTS Engine
A powerful new AI-powered text-to-speech engine with zero-shot voice cloning:
- Clone any voice from just 3-10 seconds of audio
- Natural-sounding speech generation
- Memory-efficient with int4/int8 quantization options (uses less VRAM)
- Character switching and per-segment parameter support
🎨 Step Audio EditX Audio Editor
Transform any TTS engine's output with AI-powered audio editing (post-processing):
- 14 emotions: happy, sad, angry, surprised, fearful, disgusted, contempt, neutral, etc.
- 32 speaking styles: whisper, serious, child, elderly, neutral, and more
- Speed control: make speech faster or slower
- 10 paralinguistic effects: laughter, breathing, sigh, gasp, crying, sniff, cough, yawn, scream, moan
- Audio cleanup: denoise and voice activity detection
- Universal compatibility: Works with audio from ANY TTS engine (ChatterBox, F5-TTS, Higgs Audio, VibeVoice)
🏷️ Universal Inline Edit Tags
Add audio effects directly in your text across all TTS engines:
- Easy syntax:
"Hello <Laughter> this is amazing!" - Works everywhere: Compatible with all TTS engines using Step Audio EditX post-processing
- Multiple tag types:
<emotion>,<style>,<speed>, and paralinguistic effects - Control intensity:
<Laughter:2>for stronger effect,<Laughter:3>for maximum - Voice restoration:
<restore>tag to return to original voice after edits - 📖 Read the complete Inline Edit Tags guide
📝 Multiline TTS Tag Editor Enhancements
- New tabbed interface for inline edit tag controls
- Quick-insert buttons for emotions, styles, and effects
- Better copy/paste compatibility with ComfyUI v0.3.75+
- Improved syntax highlighting and text formatting
📦 New Example Workflows
- Step Audio EditX Integration - Basic TTS usage examples
- Audio Editor + Inline Edit Tags - Advanced editing demonstrations
- Updated Voice Cleaning workflow with Step Audio EditX denoise option
🔧 Improvements
- Better memory management and model caching across all engines
TTS Audio Suite v4.9.0 - IndexTTS-2 with Advanced Emotion Control
🌈 IndexTTS-2 with Advanced Emotion Control
20250915_222427.mp4
This major release introduces IndexTTS-2, a revolutionary TTS engine with sophisticated emotion control capabilities that takes voice synthesis to the next level.
🎯 Key Features
🆕 IndexTTS-2 TTS Engine
- New state-of-the-art TTS engine with advanced emotion control system
- Multiple emotion input methods supporting audio references, text analysis, and manual vectors
- Dynamic text emotion analysis with QwenEmotion AI and contextual
{seg}templates - Per-character emotion control using
[Character:emotion_ref]syntax for fine-grained control - 8-emotion vector system (Happy, Angry, Sad, Surprised, Afraid, Disgusted, Calm, Melancholic)
- Audio reference emotion support including Character Voices integration
- Emotion intensity control from neutral to maximum dramatic expression
📖 Documentation
- Complete IndexTTS-2 Emotion Control Guide with examples and best practices
- Updated README with IndexTTS-2 features and model download information
- Timeline updated to v4.9.0 milestone
🚀 Getting Started
- Install/Update via ComfyUI Manager or manual installation
- Find IndexTTS-2 nodes in the TTS Audio Suite category
- Connect emotion control using any supported method (audio, text, vectors)
- Read the guide:
docs/IndexTTS2_Emotion_Control_Guide.md
🌟 Emotion Control Examples
Welcome to our show! [Alice:happy_sarah] I'm so excited to be here!
[Bob:angry_narrator] That's completely unacceptable behavior.
📋 Full Changelog
Added
- New IndexTTS-2 engine with sophisticated emotion control system
- Unified emotion control supporting multiple input methods (audio, text, vectors)
- Dynamic text emotion analysis with QwenEmotion AI and contextual templates
- Per-character emotion control using [Character:emotion_ref] syntax
- 8-emotion vector control (Happy, Angry, Sad, Surprised, Afraid, Disgusted, Calm, Melancholic)
- Audio reference emotion support including Character Voices integration
- Emotion intensity control from neutral to maximum dramatic expression
- Advanced caching system for improved performance
- Complete IndexTTS-2 Emotion Control Guide: docs/IndexTTS2_Emotion_Control_Guide.md
Changed
- Updated emotion control feature description in README
- Enhanced API compatibility with modern PyTorch/transformers versions
📖 Full Documentation: IndexTTS-2 Emotion Control Guide
💬 Discord: https://discord.gg/EwKE8KBDqD
☕ Support: https://ko-fi.com/diogogo
TTS Audio Suite v4.8.6 - ChatterBox Official 23-Language Engine
🌍 ChatterBox Official 23-Language Multilingual Engine
This release introduces the powerful ChatterBox Official 23-Lang engine, bringing professional multilingual TTS capabilities to ComfyUI.
🆕 Major Features
ChatterBox Official 23-Lang Engine
- 23 Languages Supported: Arabic, Chinese, Danish, Dutch, English, Finnish, French, German, Greek, Hebrew, Hindi, Italian, Japanese, Korean, Malay, Norwegian, Polish, Portuguese, Russian, Spanish, Swedish, Swahili, Turkish
- Complete SRT Support: Full subtitle processing in all 23 languages with character switching
- Unified Architecture: Seamless integration with existing TTS Audio Suite workflows
- Voice Conversion Support: Full integration with Voice Changer unified node for multilingual voice conversion with refinement passes
🔧 Improvements & Fixes
Stability & Performance
- Enhanced cache invalidation for real-time parameter changes
- Improved audio processing reliability for complex multilingual content
- Better memory management and model loading optimization
- Fixed sample rate consistency issues
🎯 Usage
The new ChatterBox Official 23-Lang engine is available through the standard TTS Audio Suite nodes:
- Configure engine with ChatterBox Official 23Lang Engine node
- Use TTS Text or TTS SRT nodes for generation
- Use Voice Changer node for multilingual voice conversion
- Supports all existing character switching and voice management features
Perfect for content creators, developers, and anyone needing professional multilingual TTS capabilities in ComfyUI.
v4.7.0 - ChatterBox 11 models/Languages
🌍 ChatterBox Language Expansion
New Languages Added (7 new, 11 total models)
- 🇮🇹 Italian - Bilingual model with automatic
[it]prefix for Italian text - 🇫🇷 French - 1400+ hours training dataset with voice cloning
- 🇷🇺 Russian - Complete model with training artifacts
- 🇦🇲 Armenian - Full model with unique architecture
- 🇬🇪 Georgian - Full model with specialized features
- 🇯🇵 Japanese - With proper Japanese tokenizer support
- 🇰🇷 Korean - With Korean tokenizer support
- 🇩🇪 German variants:
- havok2 - Multi-speaker hybrid, best quality
- SebastianBodza - Emotion control with
<haha>,<wow>tags
Critical Fixes
- Fixed tokenizer discovery for non-Latin languages (Japanese/Korean were using English tokenizer)
- Fixed vocabulary size mismatches for extended vocabularies
- Fixed state dict key format issues for incomplete models
- Italian prefix system for proper bilingual support
Technical Improvements
- Unified model architecture support for Italian single-checkpoint model
- Smart tokenizer discovery prioritizing language-specific files
- Extended vocabulary support (1500 tokens for Italian)
All models auto-download from HuggingFace on first use.
TTS Audio Suite v4.6.16 - Complete VibeVoice Integration
🎉 Complete VibeVoice Integration Release
🆕 VibeVoice Engine - Now Fully Integrated!
This release marks the complete integration of Microsoft's VibeVoice engine into TTS Audio Suite, bringing professional-quality multilingual text-to-speech with advanced multi-speaker capabilities.
✨ What's New in VibeVoice
🎭 Dual Multi-Speaker Modes
- Native Multi-Speaker Mode: Use VibeVoice's built-in 4-speaker system with "Speaker 1:", "Speaker 2:" format
- Custom Character Switching: Full character voice management with unlimited speakers using your own voice references
📝 Complete SRT Subtitle Support
- Full subtitle timing with all modes: stretch_to_fit, pad_with_silence, smart_natural, concatenate
- Multi-character subtitle processing with proper timing
- Seamless integration with existing SRT workflows
🤖 Two Model Options Available
- vibevoice-1.5B (~5.4GB) - Faster inference, great quality
- vibevoice-7B (~18GB) - Maximum quality, slower inference
- Auto-download with HuggingFace integration and legacy path support
🧠 Smart Memory Management
- Proper integration with ComfyUI's "Clear VRAM" button
- Automatic model unloading when memory is low
- Consistent architecture with other TTS engines
💡 VibeVoice Pro Tips
🎵 Watch for Music Mode: VibeVoice has built-in music/podcast detection. Avoid starting text with greetings like "Hello!" or "Welcome!" as these may trigger a different speaking style than intended.
🎯 Best Practices:
- Use complete sentences rather than short phrases
- Provide context in your text for better voice matching
- Test different text lengths to find the sweet spot for your voice references
🌍 Supported Languages
VibeVoice supports English and Chinese with high-quality synthesis for both languages.
📋 How to Use VibeVoice
- Basic TTS: Use "TTS Text" node, select VibeVoice engine
- SRT Subtitles: Use "TTS SRT" node with VibeVoice engine
- Multi-Speaker: Choose between Native (4 speakers max) or Custom Character modes
- Voice References: Add your own voice samples via Character Voices node
🔧 Full Engine Lineup
TTS Audio Suite now includes 5 complete TTS engines:
- ✅ ChatterBox - Fast, efficient TTS with voice conversion
- ✅ F5-TTS - Zero-shot voice cloning with reference audio
- ✅ Higgs Audio 2 - Professional voice cloning and synthesis
- ✅ VibeVoice - Multilingual TTS with multi-speaker support
- ✅ RVC Integration - Voice conversion post-processing
🐛 Bug Fixes & Improvements
- Improve VibeVoice memory management and unloading
- Better integration with ComfyUI's memory management system
- More reliable model unloading when using 'Clear VRAM' button
- Consistent architecture across all TTS engines for better maintainability
- Enhanced stability when switching between different models
Full Changelog: https://github.com/diodiogod/TTS-Audio-Suite/blob/main/CHANGELOG.md
Download: Install via ComfyUI Manager or clone from GitHub
Documentation: Check the folder and example workflows
Support: Report issues on GitHub Issues page
v4.5.5 - 🔧 Automatic Installation
🔧 Automatic Installation
Installation now works automatically through ComfyUI Manager with zero manual setup required.
What's New
- 🤖 Automatic dependency resolution - All conflicts handled automatically with the install.py
- 🐍 Python 3.13 full support - Works on latest Python versions
- ⚡ Smart installation system - Prevents version conflicts that caused engine failures
For Users
Install or update through ComfyUI Manager and everything works. No manual steps needed.
✅ All 19 nodes load successfully without manual install intervention.