feat: Granary v2 pipeline — vLLM 0.19.1, throughput metadata export, standardized audio stages by mohammadaaftabv · Pull Request #1881 · NVIDIA-NeMo/Curator

mohammadaaftabv · 2026-04-28T10:54:46Z

Summary

Upgrade vLLM to 0.19.1 with torch==2.10.0 for Qwen3.5 MoE architecture support
Add throughput metadata export: per-stage inference timing via _log_metrics() in InferenceQwenOmniStage and PnCRestorationStage, with perf_summary.json aggregate output in ShardedManifestWriterStage
Pin dependencies (vllm==0.19.1, torch==2.10.0+cu128, transformers==5.7.0) for stable ABI compatibility
Fix config paths (/src/Curator -> /opt/Curator) for Docker image alignment
Add pipeline architecture documentation

Changes

nemo_curator/stages/audio/inference/qwen_omni.py — add inference timing + utterance count metrics
nemo_curator/stages/audio/text_filtering/pnc_restoration.py — add inference timing + restoration count metrics
nemo_curator/stages/audio/io/sharded_manifest_writer.py — add perf_summary.json + per-shard _perf.jsonl export
tutorials/audio/qwen_omni_inprocess/main.py — log perf summary after pipeline completion
pyproject.toml / uv.lock — dependency pins

Test plan

Verified pipeline completes successfully on Kratos with 17,197 utterances (25.5 min)
Docker image built and pushed with correct dependency versions
Verify perf_summary.json output in next Kratos run

…tion Signed-off-by: Meline Mkrtchyan <mmkrtchyan@nvidia.com>

…ncies Signed-off-by: Meline Mkrtchyan <mmkrtchyan@nvidia.com>

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

Signed-off-by: Meline Mkrtchyan <mmkrtchyan@nvidia.com>

Signed-off-by: Nune Tadevosyan <ntadevosyan@cw-dfw-cs-001-login-01.cm.cluster>

Signed-off-by: nune-tadevosyan <152167970+nune-tadevosyan@users.noreply.github.com>

Signed-off-by: Meline Mkrtchyan <mmkrtchyan@nvidia.com>

Signed-off-by: Nune Tadevosyan <ntadevosyan@cw-dfw-cs-001-login-01.cm.cluster>

* Add shard-level checkpointing with .done markers and corpus_id support Signed-off-by: Meline Mkrtchyan <mmkrtchyan@nvidia.com> * minor fixes and ruff fixes Signed-off-by: Meline Mkrtchyan <mmkrtchyan@nvidia.com> * making output filepath to match with input filepath structure Signed-off-by: Meline Mkrtchyan <mmkrtchyan@nvidia.com> * Add case-insensitive corpus name matching in manifest path extraction Signed-off-by: Meline Mkrtchyan <mmkrtchyan@nvidia.com> --------- Signed-off-by: Meline Mkrtchyan <mmkrtchyan@nvidia.com>

Signed-off-by: Nune Tadevosyan <ntadevosyan@cw-dfw-cs-001-login-01.cm.cluster>

Signed-off-by: Meline Mkrtchyan <mmkrtchyan@nvidia.com>

Converts spoken-form text to written form (e.g. "fourteen dollars" → "$14") using batched vLLM inference with Qwen3.5-35B-A3B-FP8. Runs after PnC restoration, validates outputs against 3 hallucination checks (word insertion, novel content, excessive deletion), and falls back to input on failure. Includes bundled default prompt with 18 ITN conversion rule categories. Enabled via --enable_itn in run_pipeline.py. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Remove transformers<5.0 cap from constraint-dependencies - Remove huggingface-hub<1.0 cap from override-dependencies - Override nemo-toolkit's transformers<4.58 pin - Override video_cuda12's torch<=2.9.1 pin - Point transformers/vllm/huggingface-hub to PyPI (NVIDIA mirror behind) - Remove torch<=2.9.1 from build dependency-group Resulting versions: vLLM: 0.16.0 -> 0.19.1 torch: 2.9.1+cu128 -> 2.11.0+cu128 transformers: 4.57.6 -> 5.7.0 huggingface-hub: 0.36.2 -> 1.13.0

vLLM 0.19.1 wheels are compiled against torch 2.10.0. torch 2.11.0 has incompatible C++ ABI causing: undefined symbol: _ZN3c1013MessageLoggerC1EPKciib

Signed-off-by: aaftaabv@gmail.com <aaftaabv@gmail.com>

sarahyurick · 2026-05-04T17:41:58Z

Hi @mohammadaaftabv since the PR is very large, are you splitting to split it up by stage/module?

greptile-apps · 2026-05-04T18:01:32Z

+
+
+def test_followup_prompt_stores_disfluency() -> None:


Mock return value unpacks to wrong arity — tests will fail

QwenOmni.generate() returns a 3-tuple (pred_texts, disfluency_texts, skipped_indices), but every test mock here supplies only a 2-tuple. process_batch unpacks with pred_texts, disfluency_texts, skipped_indices = self._model.generate(...), so all affected tests will raise ValueError: not enough values to unpack (expected 3, got 2).

Affected mocks (at minimum lines 62, 74, 81, 99, 112) should each be updated to include the third element:

stage._model.generate.return_value = (["hello world"], [""], set())

greptile-apps · 2026-05-05T09:55:41Z

+    if cfg.get("enable_itn", False):
+        itn_model = cfg.get("itn_model_id", "Qwen/Qwen3.5-35B-A3B-FP8")
+        if itn_model != cfg.get("pnc_model_id", "Qwen/Qwen3.5-35B-A3B-FP8"):
+            tasks.append((f"snapshot:{itn_model}", lambda m=itn_model: snapshot_download(m)))


ITN model silently skipped when skip_pnc=True and models share default ID

_prefetch_models avoids a double-download by comparing itn_model against pnc_model_id regardless of whether PnC itself is being fetched. When skip_pnc=True and itn_model_id equals pnc_model_id (both default to "Qwen/Qwen3.5-35B-A3B-FP8"), the PnC download is skipped, the comparison itn_model != cfg.get("pnc_model_id", ...) evaluates to False, and the ITN model is never queued. The parallel pre-fetch is silently a no-op for the ITN model. The pipeline falls back to setup_on_node() downloading it sequentially, erasing the 10-15 min advantage the pre-fetch was designed to provide.

Suggested change

if cfg.get("enable_itn", False):

itn_model = cfg.get("itn_model_id", "Qwen/Qwen3.5-35B-A3B-FP8")

if itn_model != cfg.get("pnc_model_id", "Qwen/Qwen3.5-35B-A3B-FP8"):

tasks.append((f"snapshot:{itn_model}", lambda m=itn_model: snapshot_download(m)))

if cfg.get("enable_itn", False):

itn_model = cfg.get("itn_model_id", "Qwen/Qwen3.5-35B-A3B-FP8")

already_queued = any(name == f"snapshot:{itn_model}" for name, _ in tasks)

if not already_queued:

tasks.append((f"snapshot:{itn_model}", lambda m=itn_model: snapshot_download(m)))

greptile-apps · 2026-05-06T12:57:28Z

+    def num_workers(self) -> int | None:
+        return 1
+
+    def xenna_stage_spec(self) -> dict[str, Any]:
+        return {"num_workers": 1}


ManifestWriterStage missing IS_ACTOR_STAGE spec for Ray

ManifestWriterStage defines num_workers() = 1 and an Xenna-spec {"num_workers": 1}, but has no ray_stage_spec() override. In the Ray Data backend, without IS_ACTOR_STAGE: True, the executor may instantiate multiple concurrent workers for this stage. Each worker appends to the same output path, producing interleaved JSONL lines that corrupt the output file. ShardedManifestWriterStage (the parallel writer in this PR) correctly pairs num_workers() = 1 with ray_stage_spec() = {RayStageSpecKeys.IS_ACTOR_STAGE: True}; ManifestWriterStage needs the same.

greptile-apps · 2026-05-07T11:51:05Z

+    def outputs(self) -> tuple[list[str], list[str]]:
+        keys = [self.pred_text_key, self.skip_me_key]
+        if self.followup_prompt:
+            keys.append(self.disfluency_text_key)
+        return [], keys


followup_prompt_file is resolved inside _create_model() but outputs() and process_batch() both gate on self.followup_prompt (the raw dataclass field, which stays None). If a user sets only followup_prompt_file, QwenOmni will run full two-turn inference consuming GPU time and tokens, but disfluency_text_key is never written to any task and is not declared as an output — downstream stages relying on it receive tasks with the field entirely absent.

Suggested change

def outputs(self) -> tuple[list[str], list[str]]:

keys = [self.pred_text_key, self.skip_me_key]

if self.followup_prompt:

keys.append(self.disfluency_text_key)

return [], keys

def outputs(self) -> tuple[list[str], list[str]]:

keys = [self.pred_text_key, self.skip_me_key]

if self.followup_prompt or self.followup_prompt_file:

keys.append(self.disfluency_text_key)

return [], keys

melllinia and others added 30 commits April 16, 2026 09:10

Add in-process vLLM inference pipeline for Qwen3-Omni audio transcrip…

4ea4f8a

…tion Signed-off-by: Meline Mkrtchyan <mmkrtchyan@nvidia.com>

Make audio stage imports lazy to avoid pulling heavy optional depende…

98a5c05

…ncies Signed-off-by: Meline Mkrtchyan <mmkrtchyan@nvidia.com>

move to lhotse (NVIDIA-NeMo#1789)

c3587c2

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

Add deployment scripts and fix reader for multi-node support

7cf1c79

Signed-off-by: Meline Mkrtchyan <mmkrtchyan@nvidia.com>

Add deployment scripts and fix reader for multi-node support

c0ef5b7

Signed-off-by: Meline Mkrtchyan <mmkrtchyan@nvidia.com>

Combining two stages

dfd2c94

Signed-off-by: Nune Tadevosyan <ntadevosyan@cw-dfw-cs-001-login-01.cm.cluster>

waveform change

a749615

Signed-off-by: Nune Tadevosyan <ntadevosyan@cw-dfw-cs-001-login-01.cm.cluster>

Fixes

d18ac7e

Signed-off-by: Nune Tadevosyan <ntadevosyan@cw-dfw-cs-001-login-01.cm.cluster>

Updates

cac3b8a

Signed-off-by: Nune Tadevosyan <ntadevosyan@cw-dfw-cs-001-login-01.cm.cluster>

Updates

92f3263

Signed-off-by: Nune Tadevosyan <ntadevosyan@cw-dfw-cs-001-login-01.cm.cluster>

Update run_pipeline.py

030ea44

Signed-off-by: nune-tadevosyan <152167970+nune-tadevosyan@users.noreply.github.com>

Merge PR 1824 Combining two stages and text filtering

7047f85

Signed-off-by: Meline Mkrtchyan <mmkrtchyan@nvidia.com>

pnc restoration

37c4f4c

Signed-off-by: Nune Tadevosyan <ntadevosyan@cw-dfw-cs-001-login-01.cm.cluster>

working changes for english

e312832

Signed-off-by: Nune Tadevosyan <ntadevosyan@cw-dfw-cs-001-login-01.cm.cluster>

Final pnc

89a93af

Signed-off-by: Nune Tadevosyan <ntadevosyan@cw-dfw-cs-001-login-01.cm.cluster>

Fix for different stages

8e10503

Signed-off-by: Nune Tadevosyan <ntadevosyan@cw-dfw-cs-001-login-01.cm.cluster>

Merging checkpointing to pipeline

ec445fd

Signed-off-by: Nune Tadevosyan <ntadevosyan@cw-dfw-cs-001-login-01.cm.cluster>

remove tests

1a4638f

Signed-off-by: Nune Tadevosyan <ntadevosyan@cw-dfw-cs-001-login-01.cm.cluster>

remove low threshold check in hallucination detection

8b5b401

Signed-off-by: Nune Tadevosyan <ntadevosyan@cw-dfw-cs-001-login-01.cm.cluster>

batch support for pnc

f33d95e

Signed-off-by: Nune Tadevosyan <ntadevosyan@cw-dfw-cs-001-login-01.cm.cluster>

clean up

0c1070e

Signed-off-by: Nune Tadevosyan <ntadevosyan@cw-dfw-cs-001-login-01.cm.cluster>

ruff update

74c2b2c

Signed-off-by: Nune Tadevosyan <ntadevosyan@cw-dfw-cs-001-login-01.cm.cluster>

ruff update

1d014dc

Signed-off-by: Nune Tadevosyan <ntadevosyan@cw-dfw-cs-001-login-01.cm.cluster>

fix

4dab67a

Signed-off-by: Nune Tadevosyan <ntadevosyan@cw-dfw-cs-001-login-01.cm.cluster>

FP8 model

31e8c5d

Signed-off-by: Nune Tadevosyan <ntadevosyan@cw-dfw-cs-001-login-01.cm.cluster>

Updates

3c04350

Signed-off-by: Nune Tadevosyan <ntadevosyan@cw-dfw-cs-001-login-01.cm.cluster>

ruff update

bdd1f3b

Signed-off-by: Nune Tadevosyan <ntadevosyan@cw-dfw-cs-001-login-01.cm.cluster>

Add QwenASR hallucination recovery and unified _skip_me tracking

7682aa6

Signed-off-by: Meline Mkrtchyan <mmkrtchyan@nvidia.com>

mohammadaaftabv added 5 commits April 30, 2026 16:57

Upgrade vLLM 0.15.1 -> 0.16.0 for qwen3_5_moe support

f6a837d

fix: pin torch==2.10.0 to match vLLM 0.19.1 compiled ABI

0d0a8ef

vLLM 0.19.1 wheels are compiled against torch 2.10.0. torch 2.11.0 has incompatible C++ ABI causing: undefined symbol: _ZN3c1013MessageLoggerC1EPKciib

feat: add throughput metadata export for perf analysis

58e27f1

Signed-off-by: aaftaabv@gmail.com <aaftaabv@gmail.com>

docs: add Granary v2 pipeline architecture explanation

89aeaf8

Signed-off-by: aaftaabv@gmail.com <aaftaabv@gmail.com>

mohammadaaftabv mentioned this pull request May 1, 2026

feat: Granary v2 pipeline — vLLM 0.19.1, throughput metadata export, standardized audio stages #1905

Closed

3 tasks

mohammadaaftabv changed the title ~~Aaftabv/standardize audio stages~~ feat: Granary v2 pipeline — vLLM 0.19.1, throughput metadata export, standardized audio stages May 1, 2026

mohammadaaftabv added 2 commits May 4, 2026 17:10

docs: remove personal reference from branch comparison

6ba525d

chore: remove personal paths from curator examples

7be68b1

mohammadaaftabv marked this pull request as ready for review May 4, 2026 17:57

mohammadaaftabv requested a review from a team as a code owner May 4, 2026 17:57

greptile-apps Bot reviewed May 4, 2026

View reviewed changes

feat: add granular Granary throughput metrics

45d0672

mohammadaaftabv requested review from abhinavg4, ayushdg, oyilmaz-nvidia and praateekmahajan as code owners May 4, 2026 18:36

mohammadaaftabv added 2 commits May 5, 2026 12:54

Add Granary v2 throughput experiment controls

e97d5cc

Add runtime GPU resource overrides

e59bbb8

greptile-apps Bot reviewed May 5, 2026

View reviewed changes

mohammadaaftabv added 5 commits May 5, 2026 15:47

Make Qwen ASR cache sizing follow stage batch size

f1f51f6

Improve Granary v2 ASR cache sizing and perf summaries

ef9c13d

docs: refresh granary v2 pipeline deep dive

566e66b

refactor: reuse audio throughput metrics

0d2c25a

support ray data qwen omni backend

cb1f994

greptile-apps Bot reviewed May 6, 2026

View reviewed changes

mohammadaaftabv added 2 commits May 6, 2026 20:44

stabilize qwen backend startup

ad4cf19

configure qwen omni processors from yaml

e3d3463

greptile-apps Bot reviewed May 7, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Granary v2 pipeline — vLLM 0.19.1, throughput metadata export, standardized audio stages#1881

feat: Granary v2 pipeline — vLLM 0.19.1, throughput metadata export, standardized audio stages#1881
mohammadaaftabv wants to merge 91 commits intoNVIDIA-NeMo:mainfrom
mohammadaaftabv:aaftabv/standardize-audio-stages

mohammadaaftabv commented Apr 28, 2026 •

edited

Loading

Uh oh!

sarahyurick commented May 4, 2026

Uh oh!

greptile-apps Bot May 4, 2026

Uh oh!

greptile-apps Bot May 5, 2026

Uh oh!

greptile-apps Bot May 6, 2026

Uh oh!

greptile-apps Bot May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

mohammadaaftabv commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Test plan

Uh oh!

sarahyurick commented May 4, 2026

Uh oh!

greptile-apps Bot May 4, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot May 5, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot May 6, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot May 7, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

mohammadaaftabv commented Apr 28, 2026 •

edited

Loading