Skip to content

feat: Granary v2 pipeline — vLLM 0.19.1, throughput metadata export, standardized audio stages#1881

Open
mohammadaaftabv wants to merge 91 commits intoNVIDIA-NeMo:mainfrom
mohammadaaftabv:aaftabv/standardize-audio-stages
Open

feat: Granary v2 pipeline — vLLM 0.19.1, throughput metadata export, standardized audio stages#1881
mohammadaaftabv wants to merge 91 commits intoNVIDIA-NeMo:mainfrom
mohammadaaftabv:aaftabv/standardize-audio-stages

Conversation

@mohammadaaftabv
Copy link
Copy Markdown
Contributor

@mohammadaaftabv mohammadaaftabv commented Apr 28, 2026

Summary

  • Upgrade vLLM to 0.19.1 with torch==2.10.0 for Qwen3.5 MoE architecture support
  • Add throughput metadata export: per-stage inference timing via _log_metrics() in InferenceQwenOmniStage and PnCRestorationStage, with perf_summary.json aggregate output in ShardedManifestWriterStage
  • Pin dependencies (vllm==0.19.1, torch==2.10.0+cu128, transformers==5.7.0) for stable ABI compatibility
  • Fix config paths (/src/Curator -> /opt/Curator) for Docker image alignment
  • Add pipeline architecture documentation

Changes

  • nemo_curator/stages/audio/inference/qwen_omni.py — add inference timing + utterance count metrics
  • nemo_curator/stages/audio/text_filtering/pnc_restoration.py — add inference timing + restoration count metrics
  • nemo_curator/stages/audio/io/sharded_manifest_writer.py — add perf_summary.json + per-shard _perf.jsonl export
  • tutorials/audio/qwen_omni_inprocess/main.py — log perf summary after pipeline completion
  • pyproject.toml / uv.lock — dependency pins

Test plan

  • Verified pipeline completes successfully on Kratos with 17,197 utterances (25.5 min)
  • Docker image built and pushed with correct dependency versions
  • Verify perf_summary.json output in next Kratos run

melllinia and others added 30 commits April 16, 2026 09:10
…tion

Signed-off-by: Meline Mkrtchyan <mmkrtchyan@nvidia.com>
…ncies

Signed-off-by: Meline Mkrtchyan <mmkrtchyan@nvidia.com>
Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>
Signed-off-by: Meline Mkrtchyan <mmkrtchyan@nvidia.com>
Signed-off-by: Meline Mkrtchyan <mmkrtchyan@nvidia.com>
Signed-off-by: Nune Tadevosyan <ntadevosyan@cw-dfw-cs-001-login-01.cm.cluster>
Signed-off-by: Nune Tadevosyan <ntadevosyan@cw-dfw-cs-001-login-01.cm.cluster>
Signed-off-by: Nune Tadevosyan <ntadevosyan@cw-dfw-cs-001-login-01.cm.cluster>
Signed-off-by: Nune Tadevosyan <ntadevosyan@cw-dfw-cs-001-login-01.cm.cluster>
Signed-off-by: Nune Tadevosyan <ntadevosyan@cw-dfw-cs-001-login-01.cm.cluster>
Signed-off-by: nune-tadevosyan <152167970+nune-tadevosyan@users.noreply.github.com>
Signed-off-by: Meline Mkrtchyan <mmkrtchyan@nvidia.com>
Signed-off-by: Nune Tadevosyan <ntadevosyan@cw-dfw-cs-001-login-01.cm.cluster>
Signed-off-by: Nune Tadevosyan <ntadevosyan@cw-dfw-cs-001-login-01.cm.cluster>
Signed-off-by: Nune Tadevosyan <ntadevosyan@cw-dfw-cs-001-login-01.cm.cluster>
Signed-off-by: Nune Tadevosyan <ntadevosyan@cw-dfw-cs-001-login-01.cm.cluster>
* Add shard-level checkpointing with .done markers and corpus_id support

Signed-off-by: Meline Mkrtchyan <mmkrtchyan@nvidia.com>

* minor fixes and ruff fixes

Signed-off-by: Meline Mkrtchyan <mmkrtchyan@nvidia.com>

* making output filepath to match with input filepath structure

Signed-off-by: Meline Mkrtchyan <mmkrtchyan@nvidia.com>

* Add case-insensitive corpus name matching in manifest path extraction

Signed-off-by: Meline Mkrtchyan <mmkrtchyan@nvidia.com>

---------

Signed-off-by: Meline Mkrtchyan <mmkrtchyan@nvidia.com>
Signed-off-by: Nune Tadevosyan <ntadevosyan@cw-dfw-cs-001-login-01.cm.cluster>
Signed-off-by: Nune Tadevosyan <ntadevosyan@cw-dfw-cs-001-login-01.cm.cluster>
Signed-off-by: Nune Tadevosyan <ntadevosyan@cw-dfw-cs-001-login-01.cm.cluster>
Signed-off-by: Nune Tadevosyan <ntadevosyan@cw-dfw-cs-001-login-01.cm.cluster>
Signed-off-by: Nune Tadevosyan <ntadevosyan@cw-dfw-cs-001-login-01.cm.cluster>
Signed-off-by: Nune Tadevosyan <ntadevosyan@cw-dfw-cs-001-login-01.cm.cluster>
Signed-off-by: Nune Tadevosyan <ntadevosyan@cw-dfw-cs-001-login-01.cm.cluster>
fix
Signed-off-by: Nune Tadevosyan <ntadevosyan@cw-dfw-cs-001-login-01.cm.cluster>
Signed-off-by: Nune Tadevosyan <ntadevosyan@cw-dfw-cs-001-login-01.cm.cluster>
Signed-off-by: Nune Tadevosyan <ntadevosyan@cw-dfw-cs-001-login-01.cm.cluster>
Signed-off-by: Nune Tadevosyan <ntadevosyan@cw-dfw-cs-001-login-01.cm.cluster>
Signed-off-by: Meline Mkrtchyan <mmkrtchyan@nvidia.com>
Converts spoken-form text to written form (e.g. "fourteen dollars" → "$14")
using batched vLLM inference with Qwen3.5-35B-A3B-FP8. Runs after PnC
restoration, validates outputs against 3 hallucination checks (word insertion,
novel content, excessive deletion), and falls back to input on failure.

Includes bundled default prompt with 18 ITN conversion rule categories.
Enabled via --enable_itn in run_pipeline.py.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove transformers<5.0 cap from constraint-dependencies
- Remove huggingface-hub<1.0 cap from override-dependencies
- Override nemo-toolkit's transformers<4.58 pin
- Override video_cuda12's torch<=2.9.1 pin
- Point transformers/vllm/huggingface-hub to PyPI (NVIDIA mirror behind)
- Remove torch<=2.9.1 from build dependency-group

Resulting versions:
  vLLM: 0.16.0 -> 0.19.1
  torch: 2.9.1+cu128 -> 2.11.0+cu128
  transformers: 4.57.6 -> 5.7.0
  huggingface-hub: 0.36.2 -> 1.13.0
vLLM 0.19.1 wheels are compiled against torch 2.10.0.
torch 2.11.0 has incompatible C++ ABI causing:
  undefined symbol: _ZN3c1013MessageLoggerC1EPKciib
Signed-off-by: aaftaabv@gmail.com <aaftaabv@gmail.com>
Signed-off-by: aaftaabv@gmail.com <aaftaabv@gmail.com>
@mohammadaaftabv mohammadaaftabv changed the title Aaftabv/standardize audio stages feat: Granary v2 pipeline — vLLM 0.19.1, throughput metadata export, standardized audio stages May 1, 2026
@sarahyurick
Copy link
Copy Markdown
Contributor

Hi @mohammadaaftabv since the PR is very large, are you splitting to split it up by stage/module?

@mohammadaaftabv mohammadaaftabv marked this pull request as ready for review May 4, 2026 17:57
@mohammadaaftabv mohammadaaftabv requested a review from a team as a code owner May 4, 2026 17:57
Comment on lines +62 to +64


def test_followup_prompt_stores_disfluency() -> None:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Mock return value unpacks to wrong arity — tests will fail

QwenOmni.generate() returns a 3-tuple (pred_texts, disfluency_texts, skipped_indices), but every test mock here supplies only a 2-tuple. process_batch unpacks with pred_texts, disfluency_texts, skipped_indices = self._model.generate(...), so all affected tests will raise ValueError: not enough values to unpack (expected 3, got 2).

Affected mocks (at minimum lines 62, 74, 81, 99, 112) should each be updated to include the third element:

stage._model.generate.return_value = (["hello world"], [""], set())

Comment on lines +318 to +321
if cfg.get("enable_itn", False):
itn_model = cfg.get("itn_model_id", "Qwen/Qwen3.5-35B-A3B-FP8")
if itn_model != cfg.get("pnc_model_id", "Qwen/Qwen3.5-35B-A3B-FP8"):
tasks.append((f"snapshot:{itn_model}", lambda m=itn_model: snapshot_download(m)))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 ITN model silently skipped when skip_pnc=True and models share default ID

_prefetch_models avoids a double-download by comparing itn_model against pnc_model_id regardless of whether PnC itself is being fetched. When skip_pnc=True and itn_model_id equals pnc_model_id (both default to "Qwen/Qwen3.5-35B-A3B-FP8"), the PnC download is skipped, the comparison itn_model != cfg.get("pnc_model_id", ...) evaluates to False, and the ITN model is never queued. The parallel pre-fetch is silently a no-op for the ITN model. The pipeline falls back to setup_on_node() downloading it sequentially, erasing the 10-15 min advantage the pre-fetch was designed to provide.

Suggested change
if cfg.get("enable_itn", False):
itn_model = cfg.get("itn_model_id", "Qwen/Qwen3.5-35B-A3B-FP8")
if itn_model != cfg.get("pnc_model_id", "Qwen/Qwen3.5-35B-A3B-FP8"):
tasks.append((f"snapshot:{itn_model}", lambda m=itn_model: snapshot_download(m)))
if cfg.get("enable_itn", False):
itn_model = cfg.get("itn_model_id", "Qwen/Qwen3.5-35B-A3B-FP8")
already_queued = any(name == f"snapshot:{itn_model}" for name, _ in tasks)
if not already_queued:
tasks.append((f"snapshot:{itn_model}", lambda m=itn_model: snapshot_download(m)))

Comment on lines +317 to +321
def num_workers(self) -> int | None:
return 1

def xenna_stage_spec(self) -> dict[str, Any]:
return {"num_workers": 1}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 ManifestWriterStage missing IS_ACTOR_STAGE spec for Ray

ManifestWriterStage defines num_workers() = 1 and an Xenna-spec {"num_workers": 1}, but has no ray_stage_spec() override. In the Ray Data backend, without IS_ACTOR_STAGE: True, the executor may instantiate multiple concurrent workers for this stage. Each worker appends to the same output path, producing interleaved JSONL lines that corrupt the output file. ShardedManifestWriterStage (the parallel writer in this PR) correctly pairs num_workers() = 1 with ray_stage_spec() = {RayStageSpecKeys.IS_ACTOR_STAGE: True}; ManifestWriterStage needs the same.

Comment on lines +194 to +198
def outputs(self) -> tuple[list[str], list[str]]:
keys = [self.pred_text_key, self.skip_me_key]
if self.followup_prompt:
keys.append(self.disfluency_text_key)
return [], keys
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 followup_prompt_file is resolved inside _create_model() but outputs() and process_batch() both gate on self.followup_prompt (the raw dataclass field, which stays None). If a user sets only followup_prompt_file, QwenOmni will run full two-turn inference consuming GPU time and tokens, but disfluency_text_key is never written to any task and is not declared as an output — downstream stages relying on it receive tasks with the field entirely absent.

Suggested change
def outputs(self) -> tuple[list[str], list[str]]:
keys = [self.pred_text_key, self.skip_me_key]
if self.followup_prompt:
keys.append(self.disfluency_text_key)
return [], keys
def outputs(self) -> tuple[list[str], list[str]]:
keys = [self.pred_text_key, self.skip_me_key]
if self.followup_prompt or self.followup_prompt_file:
keys.append(self.disfluency_text_key)
return [], keys

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants