[CONTRIBUTION] Speech Dataset Generator #3604

davidmartinrius · 2024-02-23T17:37:26Z

davidmartinrius
Feb 23, 2024

Hi everyone!

I have just published this project on GitHub: https://github.com/davidmartinrius/speech-dataset-generator/

Now you can create datasets automatically with any audio or lists of audios.

I hope you can find it useful.

Here are the key functionalities of the project:

Dataset Generation: Creation of multilingual datasets with Mean Opinion Score (MOS).
Silence Removal: It includes a feature to remove silences from audio files, enhancing the overall quality.
Sound Quality Improvement: It improves the quality of the audio when needed.
Audio Segmentation: It can segment audio files within specified second ranges.
Transcription: The project transcribes the segmented audio, providing a textual representation.
Gender Identification: It identifies the gender of each speaker in the audio.
Pyannote Embeddings: Utilizes pyannote embeddings for speaker detection across multiple audio files.
Automatic Speaker Naming: Automatically assigns names to speakers detected in multiple audios.
Multiple Speaker Detection: Capable of detecting multiple speakers within each audio file.
Store speaker embeddings: The speakers are detected and stored in a Chroma database, so you do not need to assign a speaker name.
Syllabic and words-per-minute metrics
Multiple input sources: You can either use your own files or download content by pasting URLs from sources such as YouTube, LibriVox and TED Talks.

Feel free to explore the project at https://github.com/davidmartinrius/speech-dataset-generator

davidmartinrius · 2024-03-08T12:21:11Z

davidmartinrius
Mar 8, 2024
Author

Recently, I have also implemented automatic downloads from sources such as YouTube, Librivox, and TED Talks.

Additionally, speakers are now stored in a chroma vector database, and the system automatically detects them. There is no need to manually assign a name to each speaker anymore; the system handles it for you. It convert the voices to embeddings, and then each audio is compared using cosine similarity to determine if a speaker is the same as another.

This allows you to handle extensive data sets, and the labeling process will be automated.

I updated the first message in this thread with the latest features.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CONTRIBUTION] Speech Dataset Generator #3604

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

[CONTRIBUTION] Speech Dataset Generator #3604

Uh oh!

Uh oh!

davidmartinrius Feb 23, 2024

Here are the key functionalities of the project:

Replies: 1 comment

Uh oh!

davidmartinrius Mar 8, 2024 Author

davidmartinrius
Feb 23, 2024

davidmartinrius
Mar 8, 2024
Author