Skip to content

Question: How were the official RVC pretrained checkpoints f0G40k.pth and f0D40k.pth created? #2793

@thanhNgan13

Description

@thanhNgan13

Hi RVC maintainers and community,

I am currently researching RVC for an academic project on Singing Voice Conversion / AI Song Cover.

I understand from the official README that the pretrained base model uses nearly 50 hours of high-quality audio from the VCTK open-source dataset. I also understand that, during training with F0, RVC uses pretrained checkpoints such as:

f0G40k.pth
f0D40k.pth

where f0G40k.pth is the pretrained Generator and f0D40k.pth is the pretrained Discriminator.

I would like to ask whether there is any official information or reproducible recipe for how these checkpoints were originally created.

Specifically:

  1. Which exact subset of VCTK was used?
  2. Was mic1 or mic2 used?
  3. Which speakers were included or excluded?
  4. What preprocessing pipeline was used before pretraining?
  5. Which HuBERT/ContentVec feature setting was used?
  6. Which F0 extraction method was used?
  7. What were the training hyperparameters, such as batch size, learning rate, number of epochs/steps, GPU setup, and checkpoint selection criteria?
  8. Is there any script or command to reproduce the original f0G40k.pth and f0D40k.pth checkpoints?
  9. Has anyone successfully pretrained a new RVC base model from scratch, especially on a singing voice dataset instead of VCTK?

My goal is to understand the scientific and engineering background of the pretrained RVC base model for academic documentation. Any official notes, reproduction attempts, scripts, or community experience would be very helpful.

Thank you very much.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions