My First Release of an Audiobook using Coqui-TTS #2635

Hexatona · 2023-05-25T16:23:08Z

Hexatona
May 25, 2023

Watch/Listen here! https://www.youtube.com/playlist?list=PL-vT9uPaqMCksJSOGOocVokVk7G2s0upW

The Princess who Carries the Blood of the Goddess, by TheLoudGuy. Basically, a Breath of the Wild Novel if Link and Zelda's fates had been reversed.

I do this sort of thing as a hobby, and for years I used Microsoft Speech Platform's Microsoft Zira, along with Balabolka, for this purpose. I really appreciated the consistency I could achieve with that voice. But, it was robotic enough that it put a lot of people off, too. As such, I was always on the look out for other potential voices I could use. Nothing really seemed like much of an improvement, until I messed with Coqui-TTS enough to really hear the potential.

I've spent a few weeks tweaking my program to take an html document, turn it into a formatted text document, and then generate a full chapter separated series of Mp3s for the whole thing. On top of that, I changes the pitch of dialogue so you know when someone is talking, and it emphasizes italics text with a slight speed decrease.

I have to say, I learned a lot trying to do this audiobook. Finding ways to get around the voice's little errors. The voice is awesome though. The big trade-off is consistency. If you gave Zira a word, it would pronounce it the same, always. Here, it's almost always the same, but sometimes not.

I'll tweak it more in the future. I need to add in some more functionality around word replacements. I want to see just how accurate I can get it. And also find a way to generate silences better.

Anyway, I wanted to show off a bit, so - enjoy, I hope!

EDIT: oh right, I used VITS with speaker p248.

p0p4k · 2023-05-26T02:58:54Z

p0p4k
May 26, 2023

You can eliminate the inconsistent voice by fixing a seed for the normalising flow modules that sample from a Gaussian. Also, turning off sdp and using DDP helps to somewhat fix phoneme length duration in the audio relm.

2 replies

Hexatona May 26, 2023
Author

Hey! I appreciate your response, though as someone who is so far only using the released models, I have to say it went a bit over my head!

shawhu May 30, 2023

You can eliminate the inconsistent voice by fixing a seed for the normalising flow modules that sample from a Gaussian. Also, turning off sdp and using DDP helps to somewhat fix phoneme length duration in the audio relm.

thanks, that's really what i'm looking for.

shawhu · 2023-05-30T08:35:44Z

shawhu
May 30, 2023

the audio book is great. totk rocks!
and the voice you picked is awesome. i'm going to use this voice in my project too. thanks!

0 replies

StoryHack · 2023-06-08T19:30:54Z

StoryHack
Jun 8, 2023

... And also find a way to generate silences better.

In other TTS Systems that I've played with to make similar audiobooks, I generate sentence-length wavs, then use a silent, .25 second long wav that I created in audacity in between when I merge all the generated wavs.

I know it can be done in ffmpeg, but I use sox to merge the files. The call is super simple.

sox audio1.wav silence.wav audio2.wav silence.wav audio3.wav output.wav

1 reply

Hexatona Jun 13, 2023
Author

Yeah, that's basically the solution I ended up with - having a selection of pre-made silences of set lengths for certain timeframes. I think you can generate silences with Numpy, but that's work for future Hexy to figure out.

Yes, I use FFMPEG for a lot of what I do, but I've recently discovered Sox and its pitch and tempo effects are WAY better than ffmpeg's so now I have to rewrite a bit of my program to make use of it. My next audiobook is gonna be even better 😍

WSINTRA · 2023-06-23T00:37:09Z

WSINTRA
Jun 23, 2023

This is great, I came to the discussion page hoping to ask if anyone was able to use TTS for audio book creation. Specifically I am wondering if it would be possible to convert an existing audio book into one read by David Attenborough just for fun of course.
Great work op!

0 replies

Milor123 · 2023-09-01T13:41:24Z

Milor123
Sep 1, 2023

Amazing!! Could you explain me how is your workflow for this book? I have a .txt that with balabolka is converted in a wav of 15 hours, its book. But i dont know how can use coqui-ai for send my txt file, i simply cant paste the file text. Could you help me, please. I understand a few of python.

1 reply

Hexatona Feb 17, 2024
Author

Yeah, you can't just open up the command prompt and be like TTS.exe "file to read.txt" -model blah blah and have it spit out something. You need to basically write a little program to:

take your document and split it into sentences
feed those sentences into the tts api and get out wav files
roll all those wav files up into one mp3 or something

Good luck!

tom-huntington · 2023-12-24T01:07:09Z

tom-huntington
Dec 24, 2023

Are there any repositories, that I can clone, and just use via the command line? Just point it a plaintext file.

3 replies

danielw97 Dec 27, 2023

Hi,
I personally recommend epub2tts:
https://github.com/aedocw/epub2tts/
At the time of writing it supports epub as well as txt, and a bonus is that if an epub file is used chapter marks are also generated in the resulting m4b.
It's currently the best app I've found to do long-form tts using the coqui framework, although if there are any other utilities out there doing a similar thing let us know.

Milor123 Dec 28, 2023

Ohh amazyng bro!! thank u very much for share, i will test it :3 love u!!

Hexatona Feb 17, 2024
Author

Yeah, I haven't released the source for my program. It's not too terribly complicated (And probably really inefficient) but it is something that's still a work in progress.

Theancientplunderer · 2024-02-17T00:00:12Z

Theancientplunderer
Feb 17, 2024

I am so sorry to ask such a basic question, but how did you set this all up? I want to similarly have narration on my writing and I'm lost trying to navigate the technical space surrounding these AI TTS programs. Any help at all would be greatly appreciated. I have an NVIDIA GTX 3060 with 16 GB of RAM.

1 reply

Hexatona Feb 17, 2024
Author

Hi there! Sorry, you'll need to be a bit more specific in your request here.

but how did you set this all up? I want to similarly have narration on my writing and I'm lost trying to navigate the technical space surrounding these AI TTS programs

Do you mean like, how to get CoquiTTS set up? in general, and how to use it?

Assuming you're windows, this is my basic set up instructions:

1)  Install miniconda, and run it

2) create conda env to use, and run it

conda create --name coquitts python=3.9

conda activate coquitts

3) install the coquitts library

Download the latest source code from here:  https://github.com/coqui-ai/TTS

go into the directory:  pip install .

4) install the right pytorch from here:  https://pytorch.org/get-started/locally/

as for how to use it, it is used as either a command line tool, used as a library in python, or as a web server - examples of all three in COquiTTS's documentation. They've also made a few handy CoquiTTS tutorials on youtube you should watch!

Now, if you're wondering how to make an audiobook like the one I made above..?

That's a big question. I've been working on a program like this for a long time. As far as I've seen, no other Epub->AI->TTS does all the features I do. You're going to have to put in the work and make the program yourself, ask or hire me, or use something like https://github.com/aedocw/epub2tts/. (I have no used that myself, it was just mentioned elsewhere)

Yeah, I know it's all kinda intimidating. There's TONS I don't get, half of all this baffles me.

Oh also, check out the the channel, I've made more audiobooks that sound like WAY better than the one above, too!

My First Release of an Audiobook using Coqui-TTS #2635

Uh oh!

Uh oh!

Replies: 7 comments · 8 replies

Uh oh!

Uh oh!

Hexatona May 26, 2023 Author

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Hexatona Jun 13, 2023 Author

Uh oh!

Uh oh!

Uh oh!

Hexatona Feb 17, 2024 Author

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Hexatona Feb 17, 2024 Author

Uh oh!

Uh oh!

Uh oh!

Hexatona Feb 17, 2024 Author

Replies: 7 comments 8 replies

Hexatona May 26, 2023
Author

Hexatona Jun 13, 2023
Author

Hexatona Feb 17, 2024
Author

Hexatona Feb 17, 2024
Author

Hexatona Feb 17, 2024
Author