Skip to content

feat(constraints): add per-specifier provenance tracking#1189

Open
smoparth wants to merge 1 commit into
python-wheel-build:mainfrom
smoparth:feat/constraint-provenance-tracking
Open

feat(constraints): add per-specifier provenance tracking#1189
smoparth wants to merge 1 commit into
python-wheel-build:mainfrom
smoparth:feat/constraint-provenance-tracking

Conversation

@smoparth

@smoparth smoparth commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

Track which constraint files contributed each package constraint so that
engineers can quickly identify the source of a conflicting specifier when
a build fails.

  • Store provenance alongside constraints in a single _data dict as
    dict[NormalizedName, tuple[Requirement, set[str]]]
  • Add optional provenance parameter to add_constraint()
  • load_constraints_file() passes the file path as provenance
  • dump_constraints() writes source files as comments above each line
  • Add get_constraint_with_provenance() and format_provenance() methods
  • Enrich InvalidConstraintError and resolver error messages with
    source file info

Closes: #1186

Pull Request Description

What

Why

@smoparth smoparth requested a review from a team as a code owner June 5, 2026 17:43
@coderabbitai

coderabbitai Bot commented Jun 5, 2026

Copy link
Copy Markdown

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

The Constraints class gains per-source provenance tracking by changing its internal _data mapping to store (Requirement, set[str]) tuples instead of plain Requirement objects. add_constraint() gains an optional keyword-only provenance parameter; load_constraints_file() derives a provenance string from the input path/URL and passes it through. dump_constraints() now emits sorted # <source> comment lines above each constraint. Two new public methods — get_constraint_with_provenance() and format_provenance() — expose the stored provenance. InvalidConstraintError messages and resolver debug logs are augmented with (from ...) attribution. Tests in test_constraints.py and test_context.py are updated accordingly.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: adding per-specifier provenance tracking to the Constraints class to track constraint file sources.
Description check ✅ Passed The description clearly relates to the changeset, explaining the motivation (tracking constraint file sources), key implementation details, and linking to issue #1186.
Linked Issues check ✅ Passed The PR successfully implements all acceptance criteria from #1186: provenance tracking in Constraints class, inline comments in merged output, enriched error messages with source information, and comprehensive test coverage.
Out of Scope Changes check ✅ Passed All changes are directly aligned with #1186 objectives: provenance storage/retrieval, constraint loading/dumping with source tracking, error message enrichment, and test updates for new functionality.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@mergify mergify Bot added the ci label Jun 5, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/fromager/constraints.py`:
- Around line 168-177: get_provenance currently returns a shallow copy of
self._provenance so callers can mutate nested lists and change internal state;
update get_provenance to return a deep copy of the provenance mapping (e.g., use
copy.deepcopy on self._provenance[canonicalize_name(name)] or construct a new
dict copying each list) so callers cannot modify internal data structures
accessed via get_provenance; ensure you still canonicalize the name with
canonicalize_name(name) and return an empty dict when missing.

In `@tests/test_constraints.py`:
- Around line 306-307: Update the test that asserts InvalidConstraintError from
c.add_constraint to ensure the exception message contains both conflicting
sources: change the pytest.raises regex (the match= argument) to require both
"base.txt" and "override.txt" (for example using a positive lookahead pattern
like (?=.*base\.txt)(?=.*override\.txt)) so the test fails if either source is
missing from the error message; target the assertion surrounding the
c.add_constraint call that raises InvalidConstraintError.
- Around line 275-281: The test test_provenance_returns_copy currently only
verifies the outer dict is copied; update it to also check for nested mutation
leakage by mutating a list value returned by Constraints.get_provenance and
asserting that the internal provenance inside the Constraints instance is not
changed. Specifically, after using Constraints.add_constraint("foo>=1.0",
source="a.txt") call c.get_provenance("foo"), append/modify the returned list
(from the dict value) and then call c.get_provenance("foo") again to assert the
original list value remains unchanged, ensuring get_provenance returns
deep-copied (or otherwise protected) list values.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 46b1bdaf-7ff4-4858-b8d0-c2bc42261980

📥 Commits

Reviewing files that changed from the base of the PR and between b6c8fc7 and 424ccfb.

📒 Files selected for processing (5)
  • src/fromager/constraints.py
  • src/fromager/resolver.py
  • tests/test_constraints.py
  • tests/test_context.py
  • tests/test_resolver.py

Comment thread src/fromager/constraints.py Outdated
Comment thread tests/test_constraints.py Outdated
Comment thread tests/test_constraints.py Outdated
@smoparth smoparth force-pushed the feat/constraint-provenance-tracking branch from 424ccfb to 071002e Compare June 8, 2026 17:12
Comment thread src/fromager/constraints.py Outdated
Comment thread src/fromager/constraints.py Outdated
Comment thread src/fromager/constraints.py Outdated
Comment thread src/fromager/resolver.py Outdated
@smoparth smoparth force-pushed the feat/constraint-provenance-tracking branch from 071002e to 4c18d3a Compare June 16, 2026 17:48

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/fromager/constraints.py`:
- Around line 64-71: The provenance parameter in the add_constraint method is
stored and rendered directly in error messages (lines 107-112) and in
merged-constraints.txt output (lines 156-159) without sanitization, which could
expose credentials in URLs or allow CR/LF injection to corrupt the output file.
Sanitize the provenance string when it is received in the add_constraint method
by removing or escaping any credentials, query parameters, and CR/LF characters,
then use this sanitized version consistently everywhere provenance is rendered
in error messages and dumped comments throughout the add_constraint and any
related output methods.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 06334e11-e44a-412d-b78d-2b6b0a1371a5

📥 Commits

Reviewing files that changed from the base of the PR and between 071002e and 4c18d3a.

📒 Files selected for processing (5)
  • src/fromager/constraints.py
  • src/fromager/resolver.py
  • tests/test_constraints.py
  • tests/test_context.py
  • tests/test_resolver.py
✅ Files skipped from review due to trivial changes (1)
  • tests/test_resolver.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/fromager/resolver.py

Comment thread src/fromager/constraints.py
@iangelak

Copy link
Copy Markdown
Contributor

A minor thing. You should add some additional tests that are missing:

No test for provenance when source/provenance is None (relevant once the parameter becomes optional)
No test for provenance with marker-filtered constraints (constraint that doesn't match the current environment should not be tracked)
No test for provenance in the resolver error path (find_all_matching_from_provider with provenance in the exception)
No test for the is_satisfied_by debug logging path in resolver.py

@smoparth

Copy link
Copy Markdown
Contributor Author

A minor thing. You should add some additional tests that are missing:

No test for provenance when source/provenance is None (relevant once the parameter becomes optional) No test for provenance with marker-filtered constraints (constraint that doesn't match the current environment should not be tracked) No test for provenance in the resolver error path (find_all_matching_from_provider with provenance in the exception) No test for the is_satisfied_by debug logging path in resolver.py

The earlier design required provenance as a mandatory parameter. With the latest changes making it optional (None default), I've added test_add_constraint_without_provenance and test_conflict_error_without_provenance to cover that path.

For the resolver error and debug logging paths, format_provenance() is already unit-tested directly in test_format_provenance. Testing consumers of that method would be integration-level scope rather than provenance unit tests.

@smoparth smoparth requested a review from tiran June 16, 2026 18:03
Comment thread src/fromager/constraints.py Outdated
Comment thread tests/test_resolver.py Outdated
@smoparth smoparth force-pushed the feat/constraint-provenance-tracking branch 2 times, most recently from ce18d79 to d25829a Compare June 18, 2026 02:47
Track which constraint files contributed each package constraint so that
engineers can quickly identify the source of a conflicting specifier when
a build fails.

- Store provenance alongside constraints in a single `_data` dict as
  `dict[NormalizedName, tuple[Requirement, set[str]]]`
- Add optional `provenance` parameter to `add_constraint()`
- `load_constraints_file()` passes the file path as provenance
- `dump_constraints()` writes source files as comments above each line
- Add `get_constraint_with_provenance()` and `format_provenance()` methods
- Enrich `InvalidConstraintError` and resolver error messages with
  source file info

Co-Authored-By: Claude <claude@anthropic.com>
Closes: python-wheel-build#1186
Signed-off-by: Shanmukh Pawan <smoparth@redhat.com>
@smoparth smoparth force-pushed the feat/constraint-provenance-tracking branch from d25829a to 477f850 Compare June 18, 2026 03:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Track per-constraint file provenance in merged output

3 participants