Skip to content

refactor: remove ModelProviderRegistry.default and require ModelConfig.provider#625

Open
nabinchha wants to merge 1 commit intomainfrom
nmulepati/fix-590-remove-implicit-default-provider
Open

refactor: remove ModelProviderRegistry.default and require ModelConfig.provider#625
nabinchha wants to merge 1 commit intomainfrom
nmulepati/fix-590-remove-implicit-default-provider

Conversation

@nabinchha
Copy link
Copy Markdown
Contributor

Do not merge until #594 ships in a release.

#594 introduced the DeprecationWarning for ModelConfig.provider=None and the YAML default: key (issue #589). It merged on 2026-05-05, but the latest release is v0.5.9 (2026-04-28) — so users have not yet had a chance to see those warnings on an installed version. Per issue #590, the deprecation cycle must have been visible in at least one release before this PR lands.

📋 Summary

Final cleanup after the #589 deprecation cycle. Providers become a named list referenced by name end-to-end — no hidden registry default, no YAML default: leaking into DataDesigner.__init__ (#588), no ambiguity about which provider wins. Net change is +109 / -831 across 31 files (large deletion = collapsing a deprecated surface).

🔗 Related Issue

Closes #590. Depends on #588 (merged as #591) and #589 (merged as #594).

🔄 Changes

🗑️ Removed

🔧 Changed

  • ModelConfig.provider: str | None = Nonestr (required) (models.py).
  • ModelProviderRegistry.get_provider(name) now requires a str; None handling removed.
  • resolve_model_provider_registry(providers) drops the default_provider_name kwarg.
  • DataDesigner.__init__ no longer reads get_default_provider_name() or threads a YAML default through registry construction (data_designer.py).
  • ModelFormBuilder.build_config raises ValueError instead of constructing ModelConfig(provider=None) when no providers are available (model_builder.py).
  • Docs under docs/concepts/models/ updated to drop deprecation warnings and the "Change default provider" workflow.
  • CLI README.md and scripts/benchmarks/benchmark_engine_v2.py updated to match the new signatures.

🧪 Tests

  • Replaced deprecation-warning tests in test_models.py, test_default_model_settings.py, test_model_provider.py, test_provider_repository.py, test_provider_service.py, test_provider_controller.py, and test_data_designer.py with explicit-required-field tests.
  • Added test_registry_ignores_legacy_default_key and test_load_silently_ignores_legacy_default_key to lock in the migration path: pydantic v2's extra="ignore" default silently drops legacy default: keys from existing YAMLs.
  • Added test_init_no_user_providers_loads_from_yaml to cover the now-clean YAML fallback path in DataDesigner.__init__.

🔍 Attention Areas

⚠️ Reviewers: Please pay special attention to the following:

  • packages/data-designer-config/src/data_designer/config/models.py — Public API change: ModelConfig.provider is now required. Anyone constructing ModelConfig(alias=..., model=...) without provider= gets a ValidationError. Migration is one-line per call site.
  • packages/data-designer-engine/src/data_designer/engine/model_provider.py — Public API change: get_provider(name: str) no longer accepts None; resolve_model_provider_registry() no longer accepts default_provider_name.
  • Migration path for existing YAMLs with default: — relies on pydantic.BaseModel's default extra="ignore". Both ModelProviderRegistry classes inherit from BaseModel directly (not ConfigBase, which overrides to extra="forbid"), so old configs load without error. Locked in by test_registry_ignores_legacy_default_key and test_load_silently_ignores_legacy_default_key.

🧪 Testing

  • make test passes (3229 tests across config + engine + interface)
  • make check-all-fix passes (ruff lint + format)
  • Unit tests added/updated
  • E2E tests added/updated — N/A (no runtime behavior change beyond stricter validation already covered by unit tests)

✅ Checklist

  • Follows commit message conventions
  • Commits are signed off (DCO)
  • Architecture docs updated — N/A (no structural changes to package layering)

…g.provider (#590)

Final cleanup after the #589 deprecation cycle. Providers are now a
named list referenced by name end-to-end — no hidden registry default,
no YAML default leaking into DataDesigner construction (#588), and no
ambiguity about which provider wins.

ModelConfig.provider becomes a required str. ModelProviderRegistry.default
and the associated validators, get_default_provider_name() helper, CLI
"Change default provider" workflow, and the "Default" column in
`data-designer config list` are removed. resolve_model_provider_registry()
drops its default_provider_name kwarg, and DataDesigner.__init__ no
longer threads a YAML default through registry construction.

Pydantic v2's extra="ignore" default lets existing on-disk YAMLs with
default: keys load cleanly, so users on the deprecation cycle migrate
without breakage.
@nabinchha nabinchha requested a review from a team as a code owner May 9, 2026 00:19
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 9, 2026

Docs preview: https://9e548b73.dd-docs-preview.pages.dev

Notebook tutorials are placeholder-only in previews.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 9, 2026

Review: PR #625refactor: remove ModelProviderRegistry.default and require ModelConfig.provider

Summary

Final cleanup of the #589 deprecation cycle: drops the implicit "default provider" concept end-to-end. ModelConfig.provider goes from str | None = None to required str; ModelProviderRegistry.default (and both its validators + get_default_provider_name()) is removed; the CLI's set_default/get_default/"Change default provider" workflow is gone; agent introspection drops default_provider/effective_provider. Net −722 lines across 31 files, test suite updated in lockstep (deprecation-warning tests → strict-validation tests). Migration for on-disk YAMLs carrying a legacy default: relies on pydantic v2's extra="ignore" default, locked in by two new tests.

The PR is well-scoped, well-tested, and architecturally clean. The dependency direction (interface → engine → config) is preserved and the two ModelProviderRegistry classes (engine + CLI repo) are updated consistently. CI is green on the Python 3.10–3.12 matrix; 3.13 runs are still in progress at review time.

The primary risk is a hard public-API break, gated on the #594 deprecation cycle having shipped in a release — which the PR author has correctly flagged as a blocker and the release history confirms has not yet happened (last release v0.5.9, 2026-04-28; #594 merged 2026-05-05).

Findings

⛔ Blocker (already called out by author)

⚠️ Breaking changes worth an explicit user-facing note

  • ModelConfig.provider becomes required. Anyone constructing ModelConfig(alias=..., model=...) without provider= now gets ValidationError. Likewise any on-disk ~/.data-designer/model_configs.yaml entry that omitted provider: or serialised it as null will fail to load after upgrade. The PR body notes "Migration is one-line per call site" for code, but on-disk configs from pre-feat(models): deprecate implicit default provider routing #594 users have no migration helper — they will hit ValidationError on load. Consider either (a) adding a CHANGELOG/release-notes entry calling this out, or (b) catching this specific ValidationError in ProviderRepository.load / ModelRepository.load with a targeted error message pointing at the fix. I lean toward (a) since users on stale model configs are a shrinking set and a clear CHANGELOG entry is cheaper than a bespoke migration path.

  • Agent introspection shape change. get_model_aliases_state() drops default_provider from the top level and collapses configured_provider + effective_provider into a single provider field (see packages/data-designer/src/data_designer/cli/utils/agent_introspection.py). This is observable output that agents consuming data-designer agent context parse. The PR description lists this under "Removed" but doesn't flag it as a schema change. If any downstream agent scaffolding keys off those three field names, it will silently produce missing-field errors or empty strings. Worth an explicit callout in release notes alongside the ModelConfig change.

✅ Migration path for legacy YAML default: — correct and well-tested

  • Both ModelProviderRegistry classes (engine model_provider.py and CLI provider_repository.py) inherit from BaseModel directly, not ConfigBase (which overrides extra="forbid"). The PR correctly locks this in:
    • test_registry_ignores_legacy_default_key (engine)
    • test_load_silently_ignores_legacy_default_key (CLI repo)
  • Subtle-but-important invariant: if someone later rebases these classes onto ConfigBase, the migration silently breaks (old YAMLs would start raising ValidationError on the stale default: key). A short comment on the class itself — "inherits BaseModel directly so extra='ignore' lets us drop legacy default: keys" — would protect against this. The docstring on ProviderRepository.load already says this; the class itself doesn't.

🟡 Minor / nits

  • provider_controller.py menu order change. The pre-refactor code appended "exit" after the conditional "change_default" so it always rendered last. The new code inlines "exit" in the initial dict literal, which keeps it last only because there is now nothing else to append. Behaviour is identical today; flagging only because a future "add menu option" patch could accidentally push exit out of its conventional last-position slot. Not worth changing.

  • Unused import cleanup is thoroughwarn_at_caller removed from models.py, default_model_settings.py, model_provider.py, provider_controller.py, provider_repository.py. Good. No orphaned warnings import.

  • scripts/benchmarks/benchmark_engine_v2.py correctly updated to drop default_provider_name=. Nice catch — benchmark scripts are easy to miss.

  • Docs parity: Four docs/concepts/models/*.md files had their deprecation warnings stripped. Consistent. Didn't spot any orphaned #589 references in the docs tree, but a quick grep -r "589" docs/ before merging wouldn't hurt.

  • CLI README.md drops the "default: nvidia" line from the example YAML. Good — keeps user-visible examples aligned with the new schema.

Test coverage observations

  • Previously-added warning-emission tests (test_model_config_provider_none_emits_deprecation_warning, test_explicit_default_emits_deprecation_warning, test_handle_change_default_emits_deprecation_warning, etc.) have been correctly replaced by their post-cycle counterparts (ValidationError-based). No test appears to have been silently dropped.
  • test_init_no_user_providers_loads_from_yaml is the right shape for the remaining DataDesigner.__init__ YAML-fallback path, replacing the older three-test cluster (test_init_user_supplied_providers_ignore_unrelated_yaml_default, test_init_user_supplied_providers_preserve_first_wins_over_yaml_default, test_init_yaml_default_emits_single_deprecation_warning). The first two were bug: default model provider leaks from YAML into DataDesigner constructor #588 regressions — those bugs can't reoccur once the default field is gone, so removal is correct.
  • No new e2e tests, but the PR description's "N/A (no runtime behavior change beyond stricter validation)" is accurate: this is a pure API-surface tightening.

Security / performance

  • No security implications. No credentials handling changed.
  • No performance implications. One fewer model_validator on ModelConfig and three fewer on ModelProviderRegistry — negligible but nominally a micro-win.

Verdict

Approve pending release gate. The code change is clean, the migration strategy for existing YAMLs is sound and tested, and the CI signal is strong. The two items worth addressing before merge are non-blocking:

  1. (Recommended) Add a CHANGELOG / release-notes entry explicitly calling out (a) ModelConfig.provider becoming required, and (b) the agent-introspection schema change. Users upgrading from v0.5.9 → next release with stale on-disk configs or downstream agent tooling will otherwise hit an undocumented break.
  2. (Nit) Drop a one-line comment on both ModelProviderRegistry classes pinning the BaseModel-not-ConfigBase invariant that makes the legacy-default: migration work.

Both are worth doing but neither should block merge once the release gate lifts.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 9, 2026

Greptile Summary

Final cleanup completing the #589 deprecation cycle: ModelConfig.provider becomes a required str, ModelProviderRegistry.default is removed entirely, and all routing that previously fell back to an implicit registry-level default is replaced with explicit per-ModelConfig provider references. Net result is ~800 lines deleted across 31 files with no new runtime behaviour — stricter validation replaces the old silent fallback.

  • ModelConfig.provider: str (required) — any on-disk YAML or direct construction that omits provider= now raises ValidationError; callers must migrate to explicit names before upgrading.
  • ModelProviderRegistry.default removed — legacy default: keys in existing ~/.data-designer/model_providers.yaml files are silently dropped by pydantic v2's extra="ignore" default, preserving backward load compatibility.
  • CLI surface cleaned up — "Change default provider" menu option, set_default/get_default service methods, the "Default" column in config list, and effective_provider/default_provider agent-introspection fields are all gone.

Confidence Score: 4/5

Safe to merge once the release dependency in the PR header is satisfied — #594 must have shipped in a tagged release before this lands.

The change is a well-scoped removal of a deprecated surface with thorough test coverage. The one rough edge is the error message in model_builder.py that conflates the zero-provider and multi-provider-unselected cases, which could misdirect users debugging a config issue.

model_builder.py — the ValueError message in build_config conflates the zero-provider and multi-provider-unselected cases into a single message that may misdirect users.

Important Files Changed

Filename Overview
packages/data-designer-config/src/data_designer/config/models.py Makes ModelConfig.provider a required str field; removes the _warn_on_implicit_provider validator and warn_at_caller import. Clean breaking-change completion of the #589 deprecation cycle.
packages/data-designer-engine/src/data_designer/engine/model_provider.py Removes ModelProviderRegistry.default field, all three related validators, get_default_provider_name(), and the None-handling branch from get_provider(). resolve_model_provider_registry drops default_provider_name kwarg and simplifies to a single-line construction.
packages/data-designer/src/data_designer/cli/forms/model_builder.py Replaces the provider=None fallback with a ValueError in build_config; error message is slightly misleading when multiple providers are configured but none selected in form_data.
packages/data-designer/src/data_designer/cli/repositories/provider_repository.py Drops the default field from the CLI ModelProviderRegistry model and the deprecation-warning emission in load(); relies on pydantic v2 extra="ignore" to silently discard legacy default: keys from on-disk YAMLs.
packages/data-designer/src/data_designer/cli/services/provider_service.py Removes set_default/get_default methods and the default-tracking logic from update/delete. The empty-registry sentinel in add() drops default=None cleanly.
packages/data-designer/src/data_designer/interface/data_designer.py Removes get_default_provider_name import and call; collapses the two-branch model_providers initialization into a single call; drops the warnings.catch_warnings block used to suppress duplicate deprecation warnings.
packages/data-designer/src/data_designer/cli/utils/agent_introspection.py Replaces the effective_provider/configured_provider/default_provider fields with a single provider field; simplifies provider usability check now that mc.provider is always a non-None str.
packages/data-designer/src/data_designer/cli/utils/agent_text_formatter.py Removes default_provider header line from format_model_aliases_text and _format_model_aliases_context; simplifies _format_table by dropping the column_labels parameter.
packages/data-designer/src/data_designer/cli/controllers/provider_controller.py Removes _handle_change_default handler and the conditional insertion of "change_default" in the menu; "exit" is now a fixed entry in the main options dict rather than appended last.

Sequence Diagram

sequenceDiagram
    participant User
    participant DataDesigner
    participant resolve_model_provider_registry
    participant ModelProviderRegistry
    participant ModelConfig

    User->>DataDesigner: "__init__(model_providers=[...])"
    DataDesigner->>resolve_model_provider_registry: (model_providers)
    resolve_model_provider_registry->>ModelProviderRegistry: "ModelProviderRegistry(providers=model_providers)"
    note over ModelProviderRegistry: No default field.<br/>Duplicate-name check only.
    ModelProviderRegistry-->>DataDesigner: registry

    User->>ModelConfig: "ModelConfig(alias=..., model=..., provider="nvidia")"
    note over ModelConfig: provider is required str.<br/>ValidationError if missing.
    ModelConfig-->>User: config

    User->>DataDesigner: run(config)
    DataDesigner->>ModelProviderRegistry: get_provider(config.provider)
    note over ModelProviderRegistry: KeyError to UnknownProviderError<br/>if name not registered
    ModelProviderRegistry-->>DataDesigner: ModelProvider
Loading
Prompt To Fix All With AI
Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 1
packages/data-designer/src/data_designer/cli/forms/model_builder.py:294-299
The error message incorrectly states "no available providers configured" in a branch that also fires when multiple providers ARE configured but none was included in `form_data`. When `len(self.available_providers) > 1` and no `"provider"` key is present, the user sees a misleading diagnostic telling them to add a provider when providers already exist.

```suggestion
        else:
            if self.available_providers:
                raise ValueError(
                    "Cannot build ModelConfig: multiple providers are configured but none "
                    "was specified in form data. Specify provider= explicitly on each ModelConfig."
                )
            raise ValueError(
                "Cannot build ModelConfig: no provider specified in form data and no "
                "available providers configured. Add a provider first via "
                "`data-designer config providers`."
            )
```

Reviews (1): Last reviewed commit: "refactor: remove ModelProviderRegistry.d..." | Re-trigger Greptile

Comment on lines 294 to +299
else:
provider = None
raise ValueError(
"Cannot build ModelConfig: no provider specified in form data and no "
"available providers configured. Add a provider first via "
"`data-designer config providers`."
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 The error message incorrectly states "no available providers configured" in a branch that also fires when multiple providers ARE configured but none was included in form_data. When len(self.available_providers) > 1 and no "provider" key is present, the user sees a misleading diagnostic telling them to add a provider when providers already exist.

Suggested change
else:
provider = None
raise ValueError(
"Cannot build ModelConfig: no provider specified in form data and no "
"available providers configured. Add a provider first via "
"`data-designer config providers`."
)
else:
if self.available_providers:
raise ValueError(
"Cannot build ModelConfig: multiple providers are configured but none "
"was specified in form data. Specify provider= explicitly on each ModelConfig."
)
raise ValueError(
"Cannot build ModelConfig: no provider specified in form data and no "
"available providers configured. Add a provider first via "
"`data-designer config providers`."
)
Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/data-designer/src/data_designer/cli/forms/model_builder.py
Line: 294-299

Comment:
The error message incorrectly states "no available providers configured" in a branch that also fires when multiple providers ARE configured but none was included in `form_data`. When `len(self.available_providers) > 1` and no `"provider"` key is present, the user sees a misleading diagnostic telling them to add a provider when providers already exist.

```suggestion
        else:
            if self.available_providers:
                raise ValueError(
                    "Cannot build ModelConfig: multiple providers are configured but none "
                    "was specified in form data. Specify provider= explicitly on each ModelConfig."
                )
            raise ValueError(
                "Cannot build ModelConfig: no provider specified in form data and no "
                "available providers configured. Add a provider first via "
                "`data-designer config providers`."
            )
```

How can I resolve this? If you propose a fix, please make it concise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

refactor: remove ModelProviderRegistry.default and require ModelConfig.provider

1 participant