Full pipeline here: https://github.com/NVIDIA-NeMo/Curator/pull/1640 Perhaps the tutorial could be used for this: https://github.com/NVIDIA-NeMo/Curator/pull/1664
Full pipeline here: #1640
Perhaps the tutorial could be used for this: #1664