Customizing XTTS v2 for Tajik Language: Handling specific characters (ҷ, ҳ, ғ, қ, ӯ, ӣ) #4417
Unanswered
ruhullo94
asked this question in
General Q&A
Replies: 1 comment
-
|
You can check out https://github.com/anhnh2002/XTTSv2-Finetuning-for-New-Languages However, 2-5 hours likely won't be enough, especially if it's not closely related to any language already supported by XTTS. XTTS had 50+ hours for each language. G2P wouldn't be helpful because XTTS is not trained with phonemes. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hello community,
I am working on adding support for the Tajik language (tg) using XTTS v2. Since Tajik is not officially supported, I have been using the Russian (ru) language setting as a base, given the phonetic similarities.
However, I've run into a challenge with Tajik-specific Cyrillic characters: ҷ, ҳ, ғ, қ, ӯ, ӣ.
I am familiar with the model structure and have located the config.json file in the model directory. I would like to know the best practices for the following:
Tokenizer & Character Map: If I manually add these characters to the characters list in config.json, will the pre-trained XTTS v2 model be able to process them, or will it ignore them because they weren't part of the original training set?
Fine-tuning Strategy: If I decide to fine-tune the model with a Tajik dataset (approx. 2-5 hours of audio), should I initialize the training with the Russian weights? Also, do I need to expand the embedding layer to accommodate these new characters?
Phonetic Mapping: Is it more effective to use a G2P (Grapheme-to-Phoneme) approach to map these characters to their closest Russian or IPA equivalents (e.g., ҷ -> /dʒ/) instead of modifying the config?
I am currently running the model locally in a Python environment and am ready to experiment with the config.json or training scripts.
Any advice from someone who has added a similar Cyrillic-based language would be very helpful!
Thank you!
Beta Was this translation helpful? Give feedback.
All reactions