My First Release of an Audiobook using Coqui-TTS #2635
Replies: 7 comments 8 replies
-
|
You can eliminate the inconsistent voice by fixing a seed for the normalising flow modules that sample from a Gaussian. Also, turning off sdp and using DDP helps to somewhat fix phoneme length duration in the audio relm. |
Beta Was this translation helpful? Give feedback.
-
|
the audio book is great. totk rocks! |
Beta Was this translation helpful? Give feedback.
-
In other TTS Systems that I've played with to make similar audiobooks, I generate sentence-length wavs, then use a silent, .25 second long wav that I created in audacity in between when I merge all the generated wavs. I know it can be done in ffmpeg, but I use sox to merge the files. The call is super simple.
|
Beta Was this translation helpful? Give feedback.
-
|
This is great, I came to the discussion page hoping to ask if anyone was able to use TTS for audio book creation. Specifically I am wondering if it would be possible to convert an existing audio book into one read by David Attenborough just for fun of course. |
Beta Was this translation helpful? Give feedback.
-
|
Amazing!! Could you explain me how is your workflow for this book? I have a .txt that with balabolka is converted in a wav of 15 hours, its book. But i dont know how can use coqui-ai for send my txt file, i simply cant paste the file text. Could you help me, please. I understand a few of python. |
Beta Was this translation helpful? Give feedback.
-
|
Are there any repositories, that I can clone, and just use via the command line? Just point it a plaintext file. |
Beta Was this translation helpful? Give feedback.
-
|
I am so sorry to ask such a basic question, but how did you set this all up? I want to similarly have narration on my writing and I'm lost trying to navigate the technical space surrounding these AI TTS programs. Any help at all would be greatly appreciated. I have an NVIDIA GTX 3060 with 16 GB of RAM. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Watch/Listen here! https://www.youtube.com/playlist?list=PL-vT9uPaqMCksJSOGOocVokVk7G2s0upW
The Princess who Carries the Blood of the Goddess, by TheLoudGuy. Basically, a Breath of the Wild Novel if Link and Zelda's fates had been reversed.
I do this sort of thing as a hobby, and for years I used Microsoft Speech Platform's Microsoft Zira, along with Balabolka, for this purpose. I really appreciated the consistency I could achieve with that voice. But, it was robotic enough that it put a lot of people off, too. As such, I was always on the look out for other potential voices I could use. Nothing really seemed like much of an improvement, until I messed with Coqui-TTS enough to really hear the potential.
I've spent a few weeks tweaking my program to take an html document, turn it into a formatted text document, and then generate a full chapter separated series of Mp3s for the whole thing. On top of that, I changes the pitch of dialogue so you know when someone is talking, and it emphasizes italics text with a slight speed decrease.
I have to say, I learned a lot trying to do this audiobook. Finding ways to get around the voice's little errors. The voice is awesome though. The big trade-off is consistency. If you gave Zira a word, it would pronounce it the same, always. Here, it's almost always the same, but sometimes not.
I'll tweak it more in the future. I need to add in some more functionality around word replacements. I want to see just how accurate I can get it. And also find a way to generate silences better.
Anyway, I wanted to show off a bit, so - enjoy, I hope!
EDIT: oh right, I used VITS with speaker p248.
Beta Was this translation helpful? Give feedback.
All reactions