feat(engine): let column configs declare all model aliases for the startup health check#626
feat(engine): let column configs declare all model aliases for the startup health check#626
Conversation
…artup health check Plugin column configs that depend on more than one model alias (generator + judge, critic, etc.) previously could not opt their secondary aliases into the standard startup health check, and configs without a `model_alias` field crashed the collection loop with AttributeError. Add `SingleColumnConfig.get_model_aliases()` as the single override hook the builder uses to enumerate aliases. The default returns the column's primary `model_alias` (if any), so built-in LLM, embedding, and image columns work unchanged. `CustomColumnConfig` overrides it to surface decorator-declared aliases, replacing the special-case `isinstance` branch in the builder. Plugin configs with multiple model fields override it to opt every endpoint into the health check. Fixes #606 Signed-off-by: Nabin Mulepati <nmulepati@nvidia.com>
|
Docs preview: https://a39ab724.dd-docs-preview.pages.dev
|
Greptile SummaryThis PR introduces
|
| Filename | Overview |
|---|---|
| packages/data-designer-config/src/data_designer/config/base.py | Adds get_model_aliases() to SingleColumnConfig; uses getattr defensively so configs without model_alias return [] instead of raising. |
| packages/data-designer-config/src/data_designer/config/column_configs.py | Adds CustomColumnConfig.get_model_aliases() override that delegates to the existing model_aliases property, routing decorator-declared aliases through the new hook. |
| packages/data-designer-engine/src/data_designer/engine/dataset_builders/dataset_builder.py | Collapses the two-branch health-check collection loop to a single config.get_model_aliases() call; removes the now-unused CustomColumnConfig import. |
| packages/data-designer-config/tests/config/test_columns.py | New parametrized tests cover: empty-list default for configs without model_alias, primary-alias path for built-in configs, and the multi-alias override pattern. |
| packages/data-designer-engine/tests/engine/dataset_builders/test_dataset_builder.py | Two new builder-level tests: regression for #606 (multi-alias collection) and the no-op skip path when no config returns any alias. |
| packages/data-designer-engine/tests/engine/column_generators/generators/test_custom.py | Existing integration test extended to assert that get_model_aliases() on a CustomColumnConfig returns the decorator-declared alias list. |
| docs/plugins/models.md | Docs updated to reflect the accessor-driven health-check model; PairwiseJudgeColumnConfig example gains get_model_aliases() override and loses the redundant _validate() workaround. |
Sequence Diagram
sequenceDiagram
participant Builder as DatasetBuilder
participant Config as SingleColumnConfig<br/>(base)
participant CustomCfg as CustomColumnConfig<br/>(override)
participant Registry as ModelRegistry
Builder->>Builder: _run_model_health_check_if_needed()
loop for each config in single_column_configs
alt default (LLM / embedding / image configs)
Builder->>Config: get_model_aliases()
Config-->>Builder: [model_alias] or []
else CustomColumnConfig
Builder->>CustomCfg: get_model_aliases()
CustomCfg-->>Builder: [alias_a, alias_b, ...]
else plugin override (e.g. PairwiseJudge)
Builder->>Config: get_model_aliases()
Config-->>Builder: [model_alias, judge_model_alias]
end
Builder->>Builder: model_aliases.update(...)
end
alt model_aliases not empty
Builder->>Registry: run_health_check(list(model_aliases))
Registry-->>Builder: ok / raises
end
Reviews (1): Last reviewed commit: "feat(engine): let column configs declare..." | Re-trigger Greptile
Review: PR #626 —
|
📋 Summary
Plugin column configs that depend on more than one model alias (generator + judge, critic, etc.) could not opt their secondary aliases into the standard startup model health check, and configs without a
model_aliasfield crashed the collection loop withAttributeError. This PR adds a single overridable accessor —SingleColumnConfig.get_model_aliases()— so every column type, built-in or plugin, can declare exactly which model endpoints the startup health check should ping.🔗 Related Issue
Fixes #606
🔄 Changes
SingleColumnConfig.get_model_aliases()(base.py) readsmodel_aliasviagetattrso configs without that field return[]instead of raising. Built-in LLM, embedding, and image configs work unchanged via this default.CustomColumnConfig.get_model_aliases()override (column_configs.py) surfaces decorator-declared aliases through the same hook, replacing theisinstance(config, CustomColumnConfig)branch in the builder._run_model_health_check_if_needed(dataset_builder.py) collapses to a single loop overconfig.get_model_aliases(). TheCustomColumnConfigimport in the engine is removed (no longer needed).PairwiseJudgeColumnConfigexample to overrideget_model_aliases()instead of the previous workaround that registration-validated extra aliases in_validate(). The "Health checks and scheduling" section is updated to reflect accessor-driven collection.test_columns.py,test_dataset_builder.py, andtest_custom.pycover: the default behavior on configs withoutmodel_alias, the primary-alias path on built-in LLM/embedding/image configs (parametrized), the override pattern from the docs example, the multi-alias collection in the builder (regression test for feat(engine): let plugin column configs declare all model aliases for the startup health check #606), and the no-op skip path for sampler-only configs.🧪 Testing
pytest packages/data-designer-config/tests packages/data-designer-engine/tests packages/data-designer/testspasses (3262 tests)ruff format --checkandruff checkclean across the workspace✅ Checklist
feat(engine):prefix matches recent history)AGENTS.mdinvariants unchanged; this is a hook on an existing class, not a layering change)🔍 Attention Areas
packages/data-designer-engine/src/data_designer/engine/dataset_builders/dataset_builder.py— this PR drops thecolumn_type_is_model_generated(config.column_type)gate from the health-check loop, which the issue's proposed snippet kept. The reason:CustomColumnGeneratorextendsColumnGenerator, notColumnGeneratorWithModelRegistry, so that gate excludescustomcolumns and the existingisinstancebranch would not actually be subsumed by the new accessor. Callingget_model_aliases()on every config relies on the default returning[]for non-model configs (samplers, expressions, validators, seed datasets), which is verified by the newtest_run_model_health_check_skips_when_no_model_aliasestest. The newtest_run_model_health_check_collects_aliases_from_get_model_aliasestest asserts that built-in LLM andCustomColumnConfigaliases are both collected through the single loop.docs/plugins/models.md— thePairwiseJudgeColumnGeneratorexample loses its_validate()method. That method previously calledget_model_config(...)for the secondary alias to fail-fast on registration errors. Now thatget_model_aliases()opts that alias intorun_health_check, the registration check happens for free and the endpoint is actually exercised at startup, so the manual workaround is redundant.