Skip to content

fix(config): validate subcategory parent sampler type#628

Open
johnnygreco wants to merge 1 commit intomainfrom
johnny/fix-subcategory-parent-sampler-type
Open

fix(config): validate subcategory parent sampler type#628
johnnygreco wants to merge 1 commit intomainfrom
johnny/fix-subcategory-parent-sampler-type

Conversation

@johnnygreco
Copy link
Copy Markdown
Contributor

📋 Summary

This PR tightens DataDesignerConfig validation so subcategory sampler parents must be category samplers, not just any sampler column. This matches the existing error message and prevents invalid configs from pairing subcategory sampling with non-category parent values.

🔗 Related Issue

N/A — split out from plugin catalog PR review cleanup.

🔄 Changes

  • Update the subcategory parent validator to reject sampler parents whose sampler_type is not category.
  • Preserve the existing rejection for non-sampler parent columns while making the error message describe the actual parent type.
  • Add a regression test for a subcategory column whose parent is a uniform sampler.

🧪 Testing

  • uv run --package data-designer-config pytest packages/data-designer-config/tests/config/test_data_designer_config.py -q
  • uv run ruff format --check packages/data-designer-config/src/data_designer/config/data_designer_config.py packages/data-designer-config/tests/config/test_data_designer_config.py
  • uv run ruff check --output-format=full packages/data-designer-config/src/data_designer/config/data_designer_config.py packages/data-designer-config/tests/config/test_data_designer_config.py

✅ Checklist

  • Follows commit message conventions
  • Commits are signed off (DCO)
  • Architecture docs updated (N/A — validation-only bug fix)

(cherry picked from commit ec9eeea)
Signed-off-by: Johnny Greco <jogreco@nvidia.com>
@johnnygreco johnnygreco requested a review from a team as a code owner May 9, 2026 01:28
@johnnygreco johnnygreco deployed to agentic-ci May 9, 2026 01:28 — with GitHub Actions Active
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 9, 2026

Greptile Summary

Tightens the _validate_subcategory_parents model validator in DataDesignerConfig so that a subcategory column's parent must be a sampler column whose sampler_type is specifically category, not just any sampler.

  • The error message now differentiates between a parent that isn't a sampler at all vs. a sampler parent with the wrong sampler_type, giving callers a precise description of what was configured incorrectly.
  • A new regression test covers the uniform-parent case and validates the updated error message text end-to-end.

Confidence Score: 5/5

Safe to merge — the change adds a stricter guard inside a model validator, rejects invalid configs that were previously accepted silently, and is fully covered by a new regression test.

The validator logic is straightforward: one new condition branches on sampler type, error messages are accurate, and the getattr pattern safely extracts the string value from the str Enum regardless of Python version. The new test exercises the exact failure path introduced.

No files require special attention.

Important Files Changed

Filename Overview
packages/data-designer-config/src/data_designer/config/data_designer_config.py Extends subcategory-parent validator to reject non-category sampler parents with a precise error message; logic is correct and handles both non-sampler and wrong-sampler-type parents distinctly.
packages/data-designer-config/tests/config/test_data_designer_config.py Adds a regression test for a subcategory column whose parent is a uniform sampler; regex matches the actual error message emitted by the updated validator.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[_validate_subcategory_parents] --> B{column_type == 'sampler'\nAND sampler_type == SUBCATEGORY?}
    B -- No --> C[skip]
    B -- Yes --> D[lookup parent by col.params.category]
    D --> E{parent found?}
    E -- No --> F[defer to schema validator]
    E -- Yes --> G{parent.column_type == 'sampler'\nAND parent.sampler_type == CATEGORY?}
    G -- Yes --> H[valid ✓]
    G -- No --> I{parent.column_type == 'sampler'?}
    I -- Yes --> J[error: sampler column with wrong sampler_type]
    I -- No --> K[error: non-sampler column type]
Loading

Reviews (1): Last reviewed commit: "validate subcategory parent sampler type" | Re-trigger Greptile

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 9, 2026

PR #628 Review — fix(config): validate subcategory parent sampler type

Summary

This PR tightens DataDesignerConfig._validate_subcategory_parents so that a
subcategory sampler's parent must be a category sampler — previously the
validator only rejected non-sampler parents, letting e.g. a uniform sampler
silently slip through even though the error message already promised
sampler_type='category'. The error message is updated to describe the actual
offending parent type, and a regression test is added for the
uniform-parent case.

Scope is small (+33/-3, two files), contained to data-designer-config, and
consistent with the module's existing validation style.

Findings

Correctness

  • Logic is right. The guard condition
    parent.column_type != "sampler" or parent.sampler_type != SamplerType.CATEGORY
    correctly admits only category samplers, matching the error message's
    contract. The two-branch message split keeps the non-sampler diagnostic
    intact.
  • getattr(parent.sampler_type, "value", parent.sampler_type) fallback is
    unneeded but harmless.
    Every concrete sampler params class in
    sampler_params.py declares sampler_type: Literal[SamplerType.X] = SamplerType.X,
    so on a validated model parent.sampler_type is always a SamplerType
    enum. You could simplify to parent.sampler_type.value. Not worth blocking
    on — the defensive form is cheap insurance if a future sampler class ever
    stores a raw string.
  • Missing-parent path untouched. The parent is not None short-circuit
    still defers to the schema validator for unknown parent names
    (confirmed by test_subcategory_parent_missing_defers_to_schema_validator).
    Good.

Test coverage

  • The new test_subcategory_parent_must_be_a_category_sampler exercises the
    newly-tightened path with a uniform parent and asserts the message
    mentions both the actual parent sampler type and the expected one via a
    regex — this is the right shape.
  • The existing test_subcategory_parent_as_category_sampler_is_valid and
    the non-sampler-parent case (earlier in the file) together cover the three
    meaningful branches: valid category parent, non-sampler parent, wrong-type
    sampler parent. Coverage is complete for this validator.
  • Minor nit: no assertion that the error message still identifies the parent
    column by name in the new case. The existing non-sampler test likely
    covers message shape; if not, consider adding "'package_type'" to the
    regex for symmetry.

Style & conventions

  • Follows project conventions: from __future__ import annotations already
    present, absolute imports, modern typing, Pydantic @model_validator
    pattern matching the rest of the file.
  • Line 57's combined condition is getting slightly long/dense. A short
    local would read more cleanly, e.g.:
    is_category_sampler = parent.column_type == "sampler" and parent.sampler_type == SamplerType.CATEGORY
    if parent is not None and not is_category_sampler:
        ...
    Purely stylistic — current form is fine.

Performance / security / backward compatibility

  • No perf impact (single-pass validation over columns, unchanged shape).
  • No security implications.
  • Behavior change worth flagging in release notes. Configs that were
    previously accepted — a subcategory column pointing at a non-category
    sampler parent — will now raise at model_validate time. This is the
    point of the fix, but users with such configs (if any) will see a new
    ValueError on upgrade. Since the downstream generation almost certainly
    failed anyway (subcategory expects categorical parent values), the
    practical blast radius is small.
  • Architecture docs are correctly marked N/A — this is a validation-only
    tightening inside a single module, no import-direction or structural
    changes.

Verdict

Looks good. Small, focused, correct, tested. Non-blocking suggestions:

  1. Drop the getattr(..., "value", ...) fallback in favor of
    parent.sampler_type.value (the field is always a SamplerType enum).
  2. Consider a one-line release note mentioning the now-rejected configs
    since this changes validation outcomes.
  3. Optional: extract the category-check into a named local for readability.

None of these are required for merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant