Skip to content

fix(vlm): fail loudly in PP chunker when pixel_values cannot be aligned#2181

Open
khazic wants to merge 2 commits intoNVIDIA-NeMo:mainfrom
khazic:khazic/fix/vlm-pp-chunk-fallback-raise
Open

fix(vlm): fail loudly in PP chunker when pixel_values cannot be aligned#2181
khazic wants to merge 2 commits intoNVIDIA-NeMo:mainfrom
khazic:khazic/fix/vlm-pp-chunk-fallback-raise

Conversation

@khazic
Copy link
Copy Markdown
Contributor

@khazic khazic commented May 7, 2026

Summary

`_chunk_vlm_media`'s legacy fallback (`n_images != batch_size` AND `n_images_per_sample is None` AND pixel_values is not shaped `[batch_size, ...]`) used to dump every image into mb0, emit empty pixel_values for mb1..N, and log a warning. Subsequent microbatches whose text still carried media tokens would then scatter into empty tensors — silent corruption rather than a crash.

This patch replaces the warn-and-continue with `raise ValueError`, forcing the calling collate to align with the chunker's contract instead of training on broken data.

Changelog

  • fix(vlm): `_chunk_vlm_media` now raises `ValueError` when no layout matches, with a message listing the actual shapes / counts and pointing at the two valid options (`pixel_values [batch_size, ...]` or `n_images_per_sample` set).
  • test(vlm): rewrite `test_fallback_mismatched_images` and `test_pp_vlm_chunking_mismatched_images_and_batch` to assert the new `pytest.raises` contract instead of the old "warn and continue" behavior.

Behavioral note

This is a breaking change for any caller relying on the previous warn-and-continue behavior. Such callers were already producing incorrect training data — empty pixel_values scattered into media tokens cannot yield meaningful gradients — so the breakage is intentional. All in-tree collates already set `n_images_per_sample`, so no in-tree recipe is affected.

Test plan

  • `uv run pytest tests/unit_tests/recipes/test_finetune_vlm_helpers.py` — 65 passed, 3 skipped (skips are pre-existing `fused_linear_ce` GPU cases)
  • `uv run ruff check` on `recipes/vlm/finetune.py` — clean

…ed to batch

The legacy fallback in _chunk_vlm_media (when n_images != batch_size and
no n_images_per_sample is provided) used to log a warning, dump every image
into mb0, and emit empty pixel_values for mb1..N. Subsequent microbatches
whose text still contained media tokens would then scatter into empty
tensors — silent corruption rather than a crash.

Replace the warn-and-continue with raise ValueError. Callers must either pass
pixel_values shaped [batch_size, ...] (one media tensor per sample) or include
n_images_per_sample so the chunker can map images to samples. Update the two
existing tests that codified the warn-and-continue behavior to assert the new
contract via pytest.raises.

Signed-off-by: khazic <khazzz1c@gmail.com>
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 7, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@HuiyingLi
Copy link
Copy Markdown
Contributor

/ok to test 0e2d985

@HuiyingLi
Copy link
Copy Markdown
Contributor

/claude review

Copy link
Copy Markdown
Contributor

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@HuiyingLi HuiyingLi enabled auto-merge (squash) May 8, 2026 00:54
@HuiyingLi
Copy link
Copy Markdown
Contributor

/ok to test c7b7b13

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants