05 Apr 21:12

7ef2f87

v4.25.1 - Integrated RVC Model Training Latest

Latest

TTS Audio Suite v4.25.1

This release adds integrated RVC model training to TTS Audio Suite.

You can now prepare datasets, train RVC models, monitor progress in a live dashboard, and load the resulting models back into the normal RVC voice conversion workflow without leaving the suite.

New

Integrated RVC model training inside TTS Audio Suite
New 📦 RVC Dataset Prep node
New 🎛️ RVC Training Config node
New unified 🎓 Model Training node
New live training dashboard with progress, ETA, speed, and recent loss graph
Support for resume from real checkpoints
Support for continue_from using prior training artifacts or an existing 🎭 Load RVC Character Model
Safer interrupt handling with resumable checkpoint saving
Improved RVC model loading and index auto-detection
New RVC training workflow example
Updated README, model layout docs, and engine capability tables

Notes

resume continues the same training job from saved checkpoints
continue_from starts a new run from an already trained model or prior training artifacts
RVC is the first engine with integrated training support in the suite
The training architecture is unified so future engines can plug into the same workflow later

Included workflow

RVC 🎓 Model Training

Restart ComfyUI after updating.

Assets 2

01 Mar 04:15

diodiogod

v4.22.0

5cdc6c8

v4.22.0 - Echo-TTS Engine

🎧 Echo-TTS Engine

New engine contribution by @drphero!

✨ New Features

Echo-TTS - DiT-based voice cloning engine (English only)
Full integration with character switching, pause tags, and SRT timing
Auto-downloads models on first use (~7.1GB total)
License: CC-BY-NC-SA (non-commercial use only)

🙏 Credits

Full credit to @drphero for the original Echo-TTS integration.

TTS Audio Suite now includes 11 engines: ChatterBox, ChatterBox 23-Lang, F5-TTS, Higgs Audio 2, VibeVoice, IndexTTS-2, CosyVoice3, Qwen3-TTS, Echo-TTS, Step Audio EditX, and RVC.

Assets 2

28 Jan 14:22

diodiogod

v4.19.0

b3f8255

v4.19.0 - Qwen3-TTS Engine with Voice Designer

🎨 Qwen3-TTS Engine - Create Voices from Text!

Major new engine addition! Qwen3-TTS brings a unique Voice Designer feature that lets you create custom voices from natural language descriptions. Plus three distinct model types for different use cases!

✨ New Features

Qwen3-TTS Engine

🎨 Voice Designer - Create custom voices from text descriptions! "A calm female voice with British accent" → instant voice generation
Three model types with different capabilities:
- CustomVoice: 9 high-quality preset speakers (Vivian, Serena, Dylan, Eric, Ryan, etc.)
- VoiceDesign: Text-to-voice creation - describe your ideal voice and generate it
- Base: Zero-shot voice cloning from audio samples
10 language support - Chinese, English, Japanese, Korean, German, French, Russian, Portuguese, Spanish, Italian
Model sizes: 0.6B (low VRAM) and 1.7B (high quality) variants
Character voice switching with [CharacterName] syntax - automatic preset mapping
SRT subtitle timing support with all timing modes (stretch_to_fit, pad_with_silence, etc.)
Inline edit tags - Apply Step Audio EditX post-processing (emotions, styles, paralinguistic effects)
Sage attention support - Improved VRAM efficiency with sageattention backend
Smart caching - Prevents duplicate voice generation, skips model loading for existing voices
Per-segment parameters - Control [seed:42], [temperature:0.8] inline
Auto-download system - All 6 model variants downloaded automatically when needed

🎙️ Voice Designer Node

The standout feature of this release! Create voices without audio samples:

Natural language input - Describe voice characteristics in plain English
Disk caching - Saved voices load instantly without regeneration
Standard format - Works seamlessly with Character Voices system
Unified output - Compatible with all TTS nodes via NARRATOR_VOICE format

Example descriptions:

"A calm female voice with British accent"
"Deep male voice, authoritative and professional"
"Young cheerful woman, slightly high-pitched"

📚 Documentation

YAML-driven engine tables - Auto-generated comparison tables
Condensed engine overview in README
Portuguese accent guidance - Clear documentation of model limitations and workarounds

🎯 Technical Highlights

Official Qwen3-TTS implementation bundled for stability
24kHz mono audio output
Progress bars with real-time token generation tracking
VRAM management with automatic model reload and device checking
Full unified architecture integration
Interrupt handling for cancellation support

Qwen3-TTS brings a total of 10 TTS engines to the suite, each with unique capabilities. Voice Designer is a first-of-its-kind feature in ComfyUI TTS extensions!

Assets 2

30 Dec 01:17

diodiogod

v4.16.0

33643e4

v4.16.0 - CosyVoice3 TTS Engine

🎙️ CosyVoice3 TTS Engine - Zero-Shot Voice Conversion!

Major new engine addition! CosyVoice3 brings powerful TTS capabilities AND zero-shot voice conversion. Previously, ChatterBox was the only zero-shot voice changer option (RVC requires training). Now you have another high-quality option with CosyVoice3 VC!

✨ New Features

CosyVoice3 Engine

Zero-shot voice conversion (VC) - Convert any voice to match another voice without training! Another option alongside ChatterBox VC
Iterative refinement cache - Improve VC quality through multiple passes
Multilingual TTS - 4 core languages (Chinese, English, Japanese, Korean) plus 5 additional languages
Three TTS modes: zero-shot, instruction, and cross-lingual voice cloning
Model variants: standard and RL-enhanced (improved quality, set as default)
Paralinguistic tag support for natural speech effects (laughter, breath, cough, etc.)
Character voice switching with [CharacterName] syntax
Language switching with [lang:code] syntax
SRT subtitle timing support for synchronized audio generation
Per-segment parameter control ([seed:42], [speed:1.5])
Live generation progress with token-by-token updates and ETA

🔧 Improvements

Fix ChatterBox and RVC model discovery to work with custom model paths (extra_model_paths.yaml)
Fix RVC audio chunking errors with short segments

📚 Documentation

New CosyVoice3 paralinguistic tags guide
Updated README with CosyVoice3 features and examples

🙏 Credits

Initial CosyVoice3 implementation by @tazztone

Assets 2

13 Dec 00:15

diodiogod

v4.15.0

6270346

v4.15.0 - Step Audio EditX Engine & Universal Inline Edit Tags

TTS Audio Suite v4.15.0

🎉 Major New Features

⚙️ Step Audio EditX TTS Engine

A powerful new AI-powered text-to-speech engine with zero-shot voice cloning:

Clone any voice from just 3-10 seconds of audio
Natural-sounding speech generation
Memory-efficient with int4/int8 quantization options (uses less VRAM)
Character switching and per-segment parameter support

🎨 Step Audio EditX Audio Editor

Transform any TTS engine's output with AI-powered audio editing (post-processing):

14 emotions: happy, sad, angry, surprised, fearful, disgusted, contempt, neutral, etc.
32 speaking styles: whisper, serious, child, elderly, neutral, and more
Speed control: make speech faster or slower
10 paralinguistic effects: laughter, breathing, sigh, gasp, crying, sniff, cough, yawn, scream, moan
Audio cleanup: denoise and voice activity detection
Universal compatibility: Works with audio from ANY TTS engine (ChatterBox, F5-TTS, Higgs Audio, VibeVoice)

🏷️ Universal Inline Edit Tags

Add audio effects directly in your text across all TTS engines:

Easy syntax: "Hello <Laughter> this is amazing!"
Works everywhere: Compatible with all TTS engines using Step Audio EditX post-processing
Multiple tag types: <emotion>, <style>, <speed>, and paralinguistic effects
Control intensity: <Laughter:2> for stronger effect, <Laughter:3> for maximum
Voice restoration: <restore> tag to return to original voice after edits
📖 Read the complete Inline Edit Tags guide

📝 Multiline TTS Tag Editor Enhancements

New tabbed interface for inline edit tag controls
Quick-insert buttons for emotions, styles, and effects
Better copy/paste compatibility with ComfyUI v0.3.75+
Improved syntax highlighting and text formatting

📦 New Example Workflows

Step Audio EditX Integration - Basic TTS usage examples
Audio Editor + Inline Edit Tags - Advanced editing demonstrations
Updated Voice Cleaning workflow with Step Audio EditX denoise option

🔧 Improvements

Better memory management and model caching across all engines

Assets 2

16 Sep 18:19

diodiogod

v4.9.0

e9025d1

TTS Audio Suite v4.9.0 - IndexTTS-2 with Advanced Emotion Control

🌈 IndexTTS-2 with Advanced Emotion Control

20250915_222427.mp4

This major release introduces IndexTTS-2, a revolutionary TTS engine with sophisticated emotion control capabilities that takes voice synthesis to the next level.

🎯 Key Features

🆕 IndexTTS-2 TTS Engine

New state-of-the-art TTS engine with advanced emotion control system
Multiple emotion input methods supporting audio references, text analysis, and manual vectors
Dynamic text emotion analysis with QwenEmotion AI and contextual {seg} templates
Per-character emotion control using [Character:emotion_ref] syntax for fine-grained control
8-emotion vector system (Happy, Angry, Sad, Surprised, Afraid, Disgusted, Calm, Melancholic)
Audio reference emotion support including Character Voices integration
Emotion intensity control from neutral to maximum dramatic expression

📖 Documentation

Complete IndexTTS-2 Emotion Control Guide with examples and best practices
Updated README with IndexTTS-2 features and model download information
Timeline updated to v4.9.0 milestone

🚀 Getting Started

Install/Update via ComfyUI Manager or manual installation
Find IndexTTS-2 nodes in the TTS Audio Suite category
Connect emotion control using any supported method (audio, text, vectors)
Read the guide: docs/IndexTTS2_Emotion_Control_Guide.md

🌟 Emotion Control Examples

Welcome to our show! [Alice:happy_sarah] I'm so excited to be here!
[Bob:angry_narrator] That's completely unacceptable behavior.

📋 Full Changelog

Added

New IndexTTS-2 engine with sophisticated emotion control system
Unified emotion control supporting multiple input methods (audio, text, vectors)
Dynamic text emotion analysis with QwenEmotion AI and contextual templates
Per-character emotion control using [Character:emotion_ref] syntax
8-emotion vector control (Happy, Angry, Sad, Surprised, Afraid, Disgusted, Calm, Melancholic)
Audio reference emotion support including Character Voices integration
Emotion intensity control from neutral to maximum dramatic expression
Advanced caching system for improved performance
Complete IndexTTS-2 Emotion Control Guide: docs/IndexTTS2_Emotion_Control_Guide.md

Changed

Updated emotion control feature description in README
Enhanced API compatibility with modern PyTorch/transformers versions

📖 Full Documentation: IndexTTS-2 Emotion Control Guide
💬 Discord: https://discord.gg/EwKE8KBDqD
☕ Support: https://ko-fi.com/diogogo

Assets 2

06 Sep 03:19

diodiogod

v4.8.6

a3c64fb

TTS Audio Suite v4.8.6 - ChatterBox Official 23-Language Engine

🌍 ChatterBox Official 23-Language Multilingual Engine

This release introduces the powerful ChatterBox Official 23-Lang engine, bringing professional multilingual TTS capabilities to ComfyUI.

🆕 Major Features

ChatterBox Official 23-Lang Engine

23 Languages Supported: Arabic, Chinese, Danish, Dutch, English, Finnish, French, German, Greek, Hebrew, Hindi, Italian, Japanese, Korean, Malay, Norwegian, Polish, Portuguese, Russian, Spanish, Swedish, Swahili, Turkish
Complete SRT Support: Full subtitle processing in all 23 languages with character switching
Unified Architecture: Seamless integration with existing TTS Audio Suite workflows
Voice Conversion Support: Full integration with Voice Changer unified node for multilingual voice conversion with refinement passes

🔧 Improvements & Fixes

Stability & Performance

Enhanced cache invalidation for real-time parameter changes
Improved audio processing reliability for complex multilingual content
Better memory management and model loading optimization
Fixed sample rate consistency issues

🎯 Usage

The new ChatterBox Official 23-Lang engine is available through the standard TTS Audio Suite nodes:

Configure engine with ChatterBox Official 23Lang Engine node
Use TTS Text or TTS SRT nodes for generation
Use Voice Changer node for multilingual voice conversion
Supports all existing character switching and voice management features

Perfect for content creators, developers, and anyone needing professional multilingual TTS capabilities in ComfyUI.

Assets 2

04 Sep 05:29

diodiogod

v4.7.0

50835b8

v4.7.0 - ChatterBox 11 models/Languages

🌍 ChatterBox Language Expansion

New Languages Added (7 new, 11 total models)

🇮🇹 Italian - Bilingual model with automatic [it] prefix for Italian text
🇫🇷 French - 1400+ hours training dataset with voice cloning
🇷🇺 Russian - Complete model with training artifacts
🇦🇲 Armenian - Full model with unique architecture
🇬🇪 Georgian - Full model with specialized features
🇯🇵 Japanese - With proper Japanese tokenizer support
🇰🇷 Korean - With Korean tokenizer support
🇩🇪 German variants:
- havok2 - Multi-speaker hybrid, best quality
- SebastianBodza - Emotion control with <haha>, <wow> tags

Critical Fixes

Fixed tokenizer discovery for non-Latin languages (Japanese/Korean were using English tokenizer)
Fixed vocabulary size mismatches for extended vocabularies
Fixed state dict key format issues for incomplete models
Italian prefix system for proper bilingual support

Technical Improvements

Unified model architecture support for Italian single-checkpoint model
Smart tokenizer discovery prioritizing language-specific files
Extended vocabulary support (1500 tokens for Italian)

All models auto-download from HuggingFace on first use.

Assets 2

30 Aug 17:18

diodiogod

v4.6.16

1605b06

TTS Audio Suite v4.6.16 - Complete VibeVoice Integration

🎉 Complete VibeVoice Integration Release

🆕 VibeVoice Engine - Now Fully Integrated!

This release marks the complete integration of Microsoft's VibeVoice engine into TTS Audio Suite, bringing professional-quality multilingual text-to-speech with advanced multi-speaker capabilities.

✨ What's New in VibeVoice

🎭 Dual Multi-Speaker Modes

Native Multi-Speaker Mode: Use VibeVoice's built-in 4-speaker system with "Speaker 1:", "Speaker 2:" format
Custom Character Switching: Full character voice management with unlimited speakers using your own voice references

📝 Complete SRT Subtitle Support

Full subtitle timing with all modes: stretch_to_fit, pad_with_silence, smart_natural, concatenate
Multi-character subtitle processing with proper timing
Seamless integration with existing SRT workflows

🤖 Two Model Options Available

vibevoice-1.5B (~5.4GB) - Faster inference, great quality
vibevoice-7B (~18GB) - Maximum quality, slower inference
Auto-download with HuggingFace integration and legacy path support

🧠 Smart Memory Management

Proper integration with ComfyUI's "Clear VRAM" button
Automatic model unloading when memory is low
Consistent architecture with other TTS engines

💡 VibeVoice Pro Tips

⚠️ Text Length Matters: VibeVoice works best with medium to long texts. Short phrases may not capture the voice reference quality well - aim for at least 2-3 sentences for optimal results.

🎵 Watch for Music Mode: VibeVoice has built-in music/podcast detection. Avoid starting text with greetings like "Hello!" or "Welcome!" as these may trigger a different speaking style than intended.

🎯 Best Practices:

Use complete sentences rather than short phrases
Provide context in your text for better voice matching
Test different text lengths to find the sweet spot for your voice references

🌍 Supported Languages

VibeVoice supports English and Chinese with high-quality synthesis for both languages.

📋 How to Use VibeVoice

Basic TTS: Use "TTS Text" node, select VibeVoice engine
SRT Subtitles: Use "TTS SRT" node with VibeVoice engine
Multi-Speaker: Choose between Native (4 speakers max) or Custom Character modes
Voice References: Add your own voice samples via Character Voices node

🔧 Full Engine Lineup

TTS Audio Suite now includes 5 complete TTS engines:

✅ ChatterBox - Fast, efficient TTS with voice conversion
✅ F5-TTS - Zero-shot voice cloning with reference audio
✅ Higgs Audio 2 - Professional voice cloning and synthesis
✅ VibeVoice - Multilingual TTS with multi-speaker support
✅ RVC Integration - Voice conversion post-processing

🐛 Bug Fixes & Improvements

Improve VibeVoice memory management and unloading
Better integration with ComfyUI's memory management system
More reliable model unloading when using 'Clear VRAM' button
Consistent architecture across all TTS engines for better maintainability
Enhanced stability when switching between different models

Full Changelog: https://github.com/diodiogod/TTS-Audio-Suite/blob/main/CHANGELOG.md

Download: Install via ComfyUI Manager or clone from GitHub
Documentation: Check the folder and example workflows
Support: Report issues on GitHub Issues page

Assets 2

22 Aug 17:46

diodiogod

v4.5.5

d8fe6b0

v4.5.5 - 🔧 Automatic Installation

🔧 Automatic Installation

Installation now works automatically through ComfyUI Manager with zero manual setup required.

What's New

🤖 Automatic dependency resolution - All conflicts handled automatically with the install.py
🐍 Python 3.13 full support - Works on latest Python versions
⚡ Smart installation system - Prevents version conflicts that caused engine failures

For Users

Install or update through ComfyUI Manager and everything works. No manual steps needed.

✅ All 19 nodes load successfully without manual install intervention.

Assets 2

Uh oh!

Releases: diodiogod/TTS-Audio-Suite

v4.25.1 - Integrated RVC Model Training

TTS Audio Suite v4.25.1

New

Notes

Included workflow

Uh oh!

v4.22.0 - Echo-TTS Engine

🎧 Echo-TTS Engine

✨ New Features

🙏 Credits

Uh oh!

v4.19.0 - Qwen3-TTS Engine with Voice Designer

🎨 Qwen3-TTS Engine - Create Voices from Text!

✨ New Features

🎙️ Voice Designer Node

📚 Documentation

🎯 Technical Highlights

Uh oh!

v4.16.0 - CosyVoice3 TTS Engine

🎙️ CosyVoice3 TTS Engine - Zero-Shot Voice Conversion!

✨ New Features

🔧 Improvements

📚 Documentation

🙏 Credits

Uh oh!

v4.15.0 - Step Audio EditX Engine & Universal Inline Edit Tags

TTS Audio Suite v4.15.0

🎉 Major New Features

⚙️ Step Audio EditX TTS Engine

🎨 Step Audio EditX Audio Editor

🏷️ Universal Inline Edit Tags

📝 Multiline TTS Tag Editor Enhancements

📦 New Example Workflows

🔧 Improvements

Uh oh!

TTS Audio Suite v4.9.0 - IndexTTS-2 with Advanced Emotion Control

🌈 IndexTTS-2 with Advanced Emotion Control

🎯 Key Features

🆕 IndexTTS-2 TTS Engine

📖 Documentation

🚀 Getting Started

🌟 Emotion Control Examples

📋 Full Changelog

Added

Changed

Uh oh!

TTS Audio Suite v4.8.6 - ChatterBox Official 23-Language Engine

🌍 ChatterBox Official 23-Language Multilingual Engine

🆕 Major Features

ChatterBox Official 23-Lang Engine

🔧 Improvements & Fixes

Stability & Performance

🎯 Usage

Uh oh!

v4.7.0 - ChatterBox 11 models/Languages

🌍 ChatterBox Language Expansion

New Languages Added (7 new, 11 total models)

Critical Fixes

Technical Improvements

Uh oh!

TTS Audio Suite v4.6.16 - Complete VibeVoice Integration

🎉 Complete VibeVoice Integration Release

🆕 VibeVoice Engine - Now Fully Integrated!

✨ What's New in VibeVoice

💡 VibeVoice Pro Tips

🌍 Supported Languages

📋 How to Use VibeVoice

🔧 Full Engine Lineup

🐛 Bug Fixes & Improvements

Uh oh!

v4.5.5 - 🔧 Automatic Installation

🔧 Automatic Installation

What's New

For Users

Uh oh!