Skip to content

feat(constraints): add per-specifier provenance tracking#1189

Open
smoparth wants to merge 1 commit into
python-wheel-build:mainfrom
smoparth:feat/constraint-provenance-tracking
Open

feat(constraints): add per-specifier provenance tracking#1189
smoparth wants to merge 1 commit into
python-wheel-build:mainfrom
smoparth:feat/constraint-provenance-tracking

Conversation

@smoparth
Copy link
Copy Markdown
Contributor

@smoparth smoparth commented Jun 5, 2026

When multiple constraint files are merged, engineers had no way to determine which input file contributed each specifier. This adds provenance tracking so every constraint records its source file.

  • Add _provenance dict to Constraints class, grouped by source file
  • Make source a required keyword-only parameter on add_constraint()
  • Add get_provenance() public method returning {source: [lines]}
  • Include inline provenance comments in merged-constraints.txt output
  • Enrich InvalidConstraintError messages with source file info
  • Add provenance to resolver rejection logs and exception messages
  • Add _format_provenance() helper for human-readable formatting

Closes: #1186

Pull Request Description

What

Why

When multiple constraint files are merged, engineers had no way to
determine which input file contributed each specifier. This adds
provenance tracking so every constraint records its source file.

- Add `_provenance` dict to `Constraints` class, grouped by source file
- Make `source` a required keyword-only parameter on `add_constraint()`
- Add `get_provenance()` public method returning `{source: [lines]}`
- Include inline provenance comments in `merged-constraints.txt` output
- Enrich `InvalidConstraintError` messages with source file info
- Add provenance to resolver rejection logs and exception messages
- Add `_format_provenance()` helper for human-readable formatting

Co-Authored-By: Claude <claude@anthropic.com>
Closes: python-wheel-build#1186
Signed-off-by: Shanmukh Pawan <smoparth@redhat.com>
@smoparth smoparth requested a review from a team as a code owner June 5, 2026 17:43
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 5, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

This PR implements per-constraint-file provenance tracking in the Constraints class. The core change adds a _provenance dictionary that maps package names to source-to-constraint-lines mappings. When add_constraint() merges specifiers from multiple files, provenance is accumulated. The merged output from dump_constraints() now includes inline comments showing which source files contributed each constraint line. Error messages for constraint conflicts and resolution failures are enriched with provenance information to help engineers quickly identify which input files caused issues.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed Title clearly summarizes the main change: adding per-specifier provenance tracking to constraints.
Description check ✅ Passed Description explains the provenance tracking feature, implementation approach, and references issue #1186.
Linked Issues check ✅ Passed PR implements all acceptance criteria: provenance tracking in Constraints, inline comments in merged output, enriched error messages, and comprehensive test coverage.
Out of Scope Changes check ✅ Passed All changes directly support provenance tracking in constraints files—no out-of-scope modifications detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@mergify mergify Bot added the ci label Jun 5, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/fromager/constraints.py`:
- Around line 168-177: get_provenance currently returns a shallow copy of
self._provenance so callers can mutate nested lists and change internal state;
update get_provenance to return a deep copy of the provenance mapping (e.g., use
copy.deepcopy on self._provenance[canonicalize_name(name)] or construct a new
dict copying each list) so callers cannot modify internal data structures
accessed via get_provenance; ensure you still canonicalize the name with
canonicalize_name(name) and return an empty dict when missing.

In `@tests/test_constraints.py`:
- Around line 306-307: Update the test that asserts InvalidConstraintError from
c.add_constraint to ensure the exception message contains both conflicting
sources: change the pytest.raises regex (the match= argument) to require both
"base.txt" and "override.txt" (for example using a positive lookahead pattern
like (?=.*base\.txt)(?=.*override\.txt)) so the test fails if either source is
missing from the error message; target the assertion surrounding the
c.add_constraint call that raises InvalidConstraintError.
- Around line 275-281: The test test_provenance_returns_copy currently only
verifies the outer dict is copied; update it to also check for nested mutation
leakage by mutating a list value returned by Constraints.get_provenance and
asserting that the internal provenance inside the Constraints instance is not
changed. Specifically, after using Constraints.add_constraint("foo>=1.0",
source="a.txt") call c.get_provenance("foo"), append/modify the returned list
(from the dict value) and then call c.get_provenance("foo") again to assert the
original list value remains unchanged, ensuring get_provenance returns
deep-copied (or otherwise protected) list values.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 46b1bdaf-7ff4-4858-b8d0-c2bc42261980

📥 Commits

Reviewing files that changed from the base of the PR and between b6c8fc7 and 424ccfb.

📒 Files selected for processing (5)
  • src/fromager/constraints.py
  • src/fromager/resolver.py
  • tests/test_constraints.py
  • tests/test_context.py
  • tests/test_resolver.py

Comment on lines +168 to +177
def get_provenance(self, name: str) -> dict[str, list[str]]:
"""Return provenance info for *name*.

Returns:
Mapping of ``{source_file: [original_constraint_lines]}``,
or an empty dict if the package has no constraints.

.. versionadded:: 0.84.0
"""
return dict(self._provenance.get(canonicalize_name(name), {}))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Return a deep copy from get_provenance() to prevent internal state mutation.

The current return value is a shallow copy; callers can mutate nested lists and silently alter self._provenance, which then changes dump_constraints() output and provenance shown in exceptions/logs.

Suggested fix
     def get_provenance(self, name: str) -> dict[str, list[str]]:
@@
-        return dict(self._provenance.get(canonicalize_name(name), {}))
+        prov = self._provenance.get(canonicalize_name(name), {})
+        return {source: list(lines) for source, lines in prov.items()}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def get_provenance(self, name: str) -> dict[str, list[str]]:
"""Return provenance info for *name*.
Returns:
Mapping of ``{source_file: [original_constraint_lines]}``,
or an empty dict if the package has no constraints.
.. versionadded:: 0.84.0
"""
return dict(self._provenance.get(canonicalize_name(name), {}))
def get_provenance(self, name: str) -> dict[str, list[str]]:
"""Return provenance info for *name*.
Returns:
Mapping of ``{source_file: [original_constraint_lines]}``,
or an empty dict if the package has no constraints.
.. versionadded:: 0.84.0
"""
prov = self._provenance.get(canonicalize_name(name), {})
return {source: list(lines) for source, lines in prov.items()}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/fromager/constraints.py` around lines 168 - 177, get_provenance currently
returns a shallow copy of self._provenance so callers can mutate nested lists
and change internal state; update get_provenance to return a deep copy of the
provenance mapping (e.g., use copy.deepcopy on
self._provenance[canonicalize_name(name)] or construct a new dict copying each
list) so callers cannot modify internal data structures accessed via
get_provenance; ensure you still canonicalize the name with
canonicalize_name(name) and return an empty dict when missing.

Comment thread tests/test_constraints.py
Comment on lines +275 to +281
def test_provenance_returns_copy() -> None:
"""get_provenance returns a copy, not the internal dict."""
c = Constraints()
c.add_constraint("foo>=1.0", source="a.txt")
prov = c.get_provenance("foo")
prov["injected.txt"] = ["foo<99"]
assert "injected.txt" not in c.get_provenance("foo")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

test_provenance_returns_copy misses nested mutation leakage

This only proves the outer dict is copied; list values can still be mutated through the returned object, which can silently corrupt internal provenance state.

Suggested test strengthening
 def test_provenance_returns_copy() -> None:
     """get_provenance returns a copy, not the internal dict."""
     c = Constraints()
     c.add_constraint("foo>=1.0", source="a.txt")
     prov = c.get_provenance("foo")
     prov["injected.txt"] = ["foo<99"]
     assert "injected.txt" not in c.get_provenance("foo")
+    prov["a.txt"].append("foo<99")
+    assert c.get_provenance("foo")["a.txt"] == ["foo>=1.0"]

As per coding guidelines: "tests/**: Verify test actually tests the intended behavior. Check for missing edge cases."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_constraints.py` around lines 275 - 281, The test
test_provenance_returns_copy currently only verifies the outer dict is copied;
update it to also check for nested mutation leakage by mutating a list value
returned by Constraints.get_provenance and asserting that the internal
provenance inside the Constraints instance is not changed. Specifically, after
using Constraints.add_constraint("foo>=1.0", source="a.txt") call
c.get_provenance("foo"), append/modify the returned list (from the dict value)
and then call c.get_provenance("foo") again to assert the original list value
remains unchanged, ensuring get_provenance returns deep-copied (or otherwise
protected) list values.

Comment thread tests/test_constraints.py
Comment on lines +306 to +307
with pytest.raises(InvalidConstraintError, match=r"base\.txt"):
c.add_constraint("foo<1.0", source="/constraints/override.txt")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Assert both conflicting sources in the error message

The current regex only checks base.txt; it won’t fail if the new source (override.txt) is missing from the exception text.

Suggested assertion update
-    with pytest.raises(InvalidConstraintError, match=r"base\.txt"):
+    with pytest.raises(
+        InvalidConstraintError,
+        match=r"(base\.txt.*override\.txt|override\.txt.*base\.txt)",
+    ):
         c.add_constraint("foo<1.0", source="/constraints/override.txt")

As per coding guidelines: "tests/**: Verify test actually tests the intended behavior. Check for missing edge cases."

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
with pytest.raises(InvalidConstraintError, match=r"base\.txt"):
c.add_constraint("foo<1.0", source="/constraints/override.txt")
with pytest.raises(
InvalidConstraintError,
match=r"(base\.txt.*override\.txt|override\.txt.*base\.txt)",
):
c.add_constraint("foo<1.0", source="/constraints/override.txt")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_constraints.py` around lines 306 - 307, Update the test that
asserts InvalidConstraintError from c.add_constraint to ensure the exception
message contains both conflicting sources: change the pytest.raises regex (the
match= argument) to require both "base.txt" and "override.txt" (for example
using a positive lookahead pattern like (?=.*base\.txt)(?=.*override\.txt)) so
the test fails if either source is missing from the error message; target the
assertion surrounding the c.add_constraint call that raises
InvalidConstraintError.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Track per-constraint file provenance in merged output

1 participant