Customizing XTTS v2 for Tajik Language: Handling specific characters (ҷ, ҳ, ғ, қ, ӯ, ӣ) #4417

ruhullo94 · 2026-04-22T12:21:10Z

ruhullo94
Apr 22, 2026

Hello community,

I am working on adding support for the Tajik language (tg) using XTTS v2. Since Tajik is not officially supported, I have been using the Russian (ru) language setting as a base, given the phonetic similarities.

However, I've run into a challenge with Tajik-specific Cyrillic characters: ҷ, ҳ, ғ, қ, ӯ, ӣ.

I am familiar with the model structure and have located the config.json file in the model directory. I would like to know the best practices for the following:

Tokenizer & Character Map: If I manually add these characters to the characters list in config.json, will the pre-trained XTTS v2 model be able to process them, or will it ignore them because they weren't part of the original training set?

Fine-tuning Strategy: If I decide to fine-tune the model with a Tajik dataset (approx. 2-5 hours of audio), should I initialize the training with the Russian weights? Also, do I need to expand the embedding layer to accommodate these new characters?

Phonetic Mapping: Is it more effective to use a G2P (Grapheme-to-Phoneme) approach to map these characters to their closest Russian or IPA equivalents (e.g., ҷ -> /dʒ/) instead of modifying the config?

I am currently running the model locally in a Python environment and am ready to experiment with the config.json or training scripts.

Any advice from someone who has added a similar Cyrillic-based language would be very helpful!

Thank you!

eginhard · 2026-04-23T21:55:21Z

eginhard
Apr 23, 2026

You can check out https://github.com/anhnh2002/XTTSv2-Finetuning-for-New-Languages

However, 2-5 hours likely won't be enough, especially if it's not closely related to any language already supported by XTTS. XTTS had 50+ hours for each language.

G2P wouldn't be helpful because XTTS is not trained with phonemes.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Customizing XTTS v2 for Tajik Language: Handling specific characters (ҷ, ҳ, ғ, қ, ӯ, ӣ) #4417

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Customizing XTTS v2 for Tajik Language: Handling specific characters (ҷ, ҳ, ғ, қ, ӯ, ӣ) #4417

Uh oh!

ruhullo94 Apr 22, 2026

Replies: 1 comment

Uh oh!

eginhard Apr 23, 2026

ruhullo94
Apr 22, 2026

eginhard
Apr 23, 2026