What's the difference between speaker embeddings and d_vector ? #1171
-
|
Hi, I am new to voice cloning. I understood that a given voice properties are called embeddings. When fine-tuning a model I read that we can choose to compute speaker embeddings or use a d_vector. Are speaker embeddings computed on the fly while training for each wav file, and d_vector computed once and for all prior to training ? What are the pros and cons of each method ? Thanks |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 2 replies
-
|
speaker embeddings are computed using a speaker embedding layer. d_vectors are computed externally from a speaker encoder model. speaker embedding model is harder to expand for more speakers once trained since each new speaker needs to be added to the speaker embedding layer d_vectors do not have this issue but you need a high-quality pre-trained speaker encoder to make this work well. |
Beta Was this translation helpful? Give feedback.
speaker embeddings are computed using a speaker embedding layer.
d_vectors are computed externally from a speaker encoder model.
speaker embedding model is harder to expand for more speakers once trained since each new speaker needs to be added to the speaker embedding layer
d_vectors do not have this issue but you need a high-quality pre-trained speaker encoder to make this work well.