test(cli): integration tests for all cf proof commands (#455) by frankbria · Pull Request #462 · frankbria/codeframe

frankbria · 2026-03-20T19:52:58Z

Summary

Closes #455. Adds tests/cli/test_proof_commands.py — 17 integration tests covering the complete cf proof CLI surface area through the Typer CliRunner against real SQLite workspaces.

Acceptance criteria

CLI integration test: cf proof capture creates a REQ and persists it
CLI integration test: cf proof run evaluates workspace against open REQs
CLI integration test: cf proof waive marks a REQ waived with expiry
CLI integration test: cf proof status shows correct summary
Closed loop: capture → run (fail) → run (pass) → evidence recorded

What's in the test file

Class	Tests	Coverage
`TestCapture`	4	Creates REQ, increments IDs, rejects invalid severity/source
`TestRun`	4	PASS/FAIL exits, no-obligations, invalid gate — `_run_gate` patched
`TestWaive`	4	With/without expiry, missing REQ exit 1, bad date format
`TestStatus`	4	Empty workspace, open count, waived count, expired waiver reverts
`TestClosedLoop`	1	Full capture → fail run → pass run → evidence recorded

Design decisions

_run_gate is patched at codeframe.core.proof.runner._run_gate — same pattern as TestRunner in tests/core/test_proof9.py
Persistence verified directly via ledger.get_requirement / ledger.list_evidence — output checks alone are insufficient
Each test class gets an isolated tmp_path workspace via create_or_load_workspace
No production code changes required — all commands were already correctly implemented

Test plan

17/17 tests pass locally
uv run ruff check clean
59/59 (proof CLI + proof unit tests) pass together

Summary by CodeRabbit

Tests
- Added comprehensive integration tests for the proof CLI covering capture, run, waive, and status flows. Verifies requirement ID generation and display, run outcomes (pass/fail/none found), waiver persistence and expiry handling, error messages for invalid input, and end-to-end status/evidence transitions across failing → passing runs.

17 tests across 5 test classes cover every acceptance criterion: AC1 — TestCapture: creates REQ-0001, persists to ledger, increments IDs, rejects invalid severity/source with exit code 1 AC2 — TestRun: PASS/FAIL exit codes, no-obligations empty path, invalid gate error, _run_gate patched via unittest.mock AC3 — TestWaive: marks waived with expiry and persists to ledger, works without expiry, rejects missing REQ (exit 1), rejects bad date format with descriptive error AC4 — TestStatus: empty workspace message, open count, waived count, expired waiver reverts to open and prints Expired notice AC5 — TestClosedLoop: capture → fail run (exit 1, FAIL in output) → status still open → pass run (exit 0, PASS) → evidence recorded All 17 tests pass. No production code changes required.

coderabbitai · 2026-03-20T19:53:13Z

Walkthrough

Adds a new end-to-end CLI test module that exercises cf proof capture, run, waive, and status commands against a temporary SQLite workspace, validating requirement creation, persistence, gate outcomes, waiver handling (including expiry), and a closed-loop capture→run→status flow.

Changes

Cohort / File(s)	Summary
Proof CLI Integration Tests `tests/cli/test_proof_commands.py`	New ~323-line integration test file exercising `codeframe.cli.app` via `CliRunner`. Covers `proof capture` (REQ ID incrementing, persistence), `proof run` (pass/fail, empty workspace, unknown gate), `proof waive` (waiver state, expiry parsing, error cases), and `proof status` (summaries, expiry handling, closed-loop evidence checks).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

feat(proof): PROOF9 quality memory system (#422) #447: Adds the proof_commands.py CLI implementation and related proof modules that these integration tests target.

Poem

🐰 I hopped through tests with a twitch of my nose,

REQ IDs grew like carrots row by row,
Runs that pass and runs that fail, I scurried to see,
Waivers that bloom and expiries set free,
A happy little ledger — proof complete for me.

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main change: adding integration tests for all cf proof commands, which aligns perfectly with the changeset.
Linked Issues check	✅ Passed	The PR implements all acceptance criteria from issue `#455`: capture test, run test, waive test, status test, and closed-loop test all present and functional.
Out of Scope Changes check	✅ Passed	The PR contains only integration tests in a new test file with no production code changes or modifications outside the test scope.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/proof-cli-integration-tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (1)

tests/cli/test_proof_commands.py (1)

51-55: Assert the seed capture in ws_with_req.

If the setup command fails here, the dependent tests fail later with misleading assertions. Failing fast in the fixture makes the regression obvious.

Example tweak

 `@pytest.fixture`()
 def ws_with_req(ws):
     """Workspace that already has one captured requirement (REQ-0001)."""
     workspace, workspace_path = ws
-    runner.invoke(app, ["proof", "capture", "-w", str(workspace_path)] + _CAPTURE_ARGS)
+    result = runner.invoke(
+        app, ["proof", "capture", "-w", str(workspace_path)] + _CAPTURE_ARGS
+    )
+    assert result.exit_code == 0, result.output
     return workspace, workspace_path

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/cli/test_proof_commands.py` around lines 51 - 55, The ws_with_req
fixture runs the seed capture but doesn't verify it succeeded; update the
ws_with_req function to assert the capture command's result is successful by
capturing the return value of runner.invoke(app, ["proof", "capture", "-w",
str(workspace_path)] + _CAPTURE_ARGS) and asserting result.exit_code == 0
(and/or that result.output contains the expected success message or created
requirement id like "REQ-0001") so the fixture fails fast when the setup command
fails.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/cli/test_proof_commands.py`:
- Around line 205-214: Update the test_waive_nonexistent_req_exits_nonzero test
to assert the CLI error message contains a stable "not found" indicator in
addition to exit code; after calling runner.invoke(app, [...]) check
result.exit_code == 1 and add an assertion like "'not found' in result.output"
(or result.stdout/result.stderr depending on runner) to ensure the failure is
the missing-REQ case rather than an unrelated usage/CLI error.
- Around line 305-315: After verifying evidence via ledger.list_evidence for
"REQ-0001", also assert the canonical satisfied state: call
ledger.get_requirement(workspace, "REQ-0001") and assert its .status (or .state)
equals the satisfied value your model uses (e.g., "satisfied" or
RequirementStatus.SATISFIED), or alternatively invoke the CLI with
runner.invoke(app, ["proof", "status", "-w", str(workspace_path), "REQ-0001"])
and assert the output shows it as satisfied; attach this assertion after the
existing evidence checks to ensure the requirement is truly recorded as closed.
- Around line 241-273: The tests use loose substring checks like assert "1" in
result.output which can match unrelated digits; tighten assertions by checking
the exact summary lines and verifying the waive CLI succeeded: in
test_status_shows_open_count assert the status output contains a line like
"Open: 1" (or use a regex anchored to the summary label) instead of just "1"; in
test_status_shows_waived_count first capture the result of runner.invoke(app,
["proof", "waive", ...]) and assert its exit_code == 0, then assert the status
contains an exact "Waived: 1" line; in
test_status_expired_waiver_reverts_to_open after injecting the past Waiver via
ledger.waive_requirement confirm the status contains both the "Expired" notice
and that the summary shows "Open: 1" (not just that "Expired" appears). Use the
test function names (test_status_shows_open_count,
test_status_shows_waived_count, test_status_expired_waiver_reverts_to_open),
runner.invoke, ledger.waive_requirement and the Waiver model to locate the
changes.

---

Nitpick comments:
In `@tests/cli/test_proof_commands.py`:
- Around line 51-55: The ws_with_req fixture runs the seed capture but doesn't
verify it succeeded; update the ws_with_req function to assert the capture
command's result is successful by capturing the return value of
runner.invoke(app, ["proof", "capture", "-w", str(workspace_path)] +
_CAPTURE_ARGS) and asserting result.exit_code == 0 (and/or that result.output
contains the expected success message or created requirement id like "REQ-0001")
so the fixture fails fast when the setup command fails.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 85023c51-de0a-49d1-be45-8327a263963b

📥 Commits

Reviewing files that changed from the base of the PR and between 936d588 and 0fa6b88.

📒 Files selected for processing (1)

tests/cli/test_proof_commands.py

tests/cli/test_proof_commands.py

claude · 2026-03-20T20:04:18Z

Review: Integration Tests for `cf proof` Commands

Good addition — 17 tests covering the full CLI surface with isolated workspaces and direct ledger verification. The structure is clean and the AC mapping is clear. A few things worth addressing:

Issues

1. Fragile string assertions (TestStatus)

test_status_shows_open_count and test_status_shows_waived_count both check that "1" appears in output. This can false-positive on "REQ-0001" or any other digit occurrence. Use a more specific match:

assert "Open: 1" in result.output
assert "Waived: 1" in result.output

Check the exact output format from cf proof status and match it precisely.

2. Late import inside test method

test_status_expired_waiver_reverts_to_open imports Waiver inside the function body. Move it to the top-level imports alongside ReqStatus.

3. ws_with_req fixture does not validate the setup invoke

If cf proof capture fails silently in the fixture, downstream tests will get confusing assertion errors. Add an assertion guard after the invoke.

4. test_waive_nonexistent_req_exits_nonzero skips output check

All other error tests verify the error message. This one only checks exit code. Add an output assertion too.

Minor Notes

Prefer == 1 over != 0 for exit code assertions for precision.
test_run_no_requirements_exits_zero runs without patching _run_gate. Worth a brief inline comment.

What's Good

Direct ledger verification (not just output) is the right call — the comment about it is a good explanation.
_run_gate patched at the correct module path, matching the pattern in test_proof9.py.
Fixture composition (ws_with_req building on ws) is clean.
The closed-loop test (AC5) is the most valuable test here — captures the full flow in one place.
pytestmark = pytest.mark.v2 follows repo convention.

- ws_with_req fixture: assert capture exit_code == 0 for fast failure on setup errors - test_waive_nonexistent_req: add "not found" message assertion alongside exit code check - test_status_shows_open_count/waived_count: tighten to "Open:"/"Waived:" substrings and verify waive exit_code == 0 before status check - test_status_expired_waiver_reverts_to_open: assert "Open:" present after expiry in addition to "Expired" - test_capture_run_enforced_then_satisfied: assert req.status == ReqStatus.SATISFIED after passing run

claude · 2026-03-20T20:10:53Z

Follow-up Review - status of issues after fix commit a23ef3a

Resolved items from prior round:

ws_with_req fixture guard - now correctly asserts exit_code == 0 with a descriptive failure message. Fixed.
test_waive_nonexistent_req_exits_nonzero - now checks both exit code and output message. Fixed.
Closed-loop test final assertion - now calls ledger.get_requirement and asserts ReqStatus.SATISFIED. Fixed.

Still open:

Issue 1 - Important - Fragile count assertions in TestStatus (lines 243, 248, 262, 266). The two-assertion pattern still exists: assert 'Open:' in result.output followed by assert '1' in result.output. The digit '1' appears in REQ-0001 and in tmp path names, so this can pass even when the count label is absent. The fix is to combine into a single specific string, for example: assert 'Open: 1' in result.output. Check the actual output format from cf proof status to confirm the exact separator before applying.

Issue 2 - Minor - Late import inside test body. The import 'from codeframe.core.proof.models import Waiver' remains inside the test_status_expired_waiver_reverts_to_open function body. Move it to the module-level imports at the top of the file alongside the existing ReqStatus import for consistency.

Issue 3 - Minor - Inconsistent exit code assertions. test_capture_invalid_severity_exits_nonzero and test_capture_invalid_source_exits_nonzero use assert result.exit_code != 0. All other error tests in this file use == 1. Prefer == 1 for precision unless the command can return multiple distinct non-zero values on error.

Summary: The three critical issues from the first review round are all resolved. Issue 1 above is the highest priority remaining item since it creates a false-passing test condition. Issues 2 and 3 are cleanup items. The overall test design is solid - isolated workspaces, direct ledger verification, correct patch target for _run_gate, and good fixture composition.

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

tests/cli/test_proof_commands.py (1)

243-278: ⚠️ Potential issue | 🟠 Major

Tighten the status summary assertions.

These checks can still pass on unrelated digits like REQ-0001, and the expired-waiver case still doesn't prove the requirement is counted as open again. Please assert the summary lines directly (Open: 1, Waived: 1, and Open: 1 after expiry), or use an anchored regex against those labels.

Suggested tightening

         result = runner.invoke(app, ["proof", "status", "-w", str(workspace_path)])
         assert result.exit_code == 0, result.output
-        assert "Open:" in result.output
-        assert "1" in result.output
+        assert "Open: 1" in result.output
@@
         result = runner.invoke(app, ["proof", "status", "-w", str(workspace_path)])
         assert result.exit_code == 0, result.output
-        assert "Waived:" in result.output
-        assert "1" in result.output
+        assert "Waived: 1" in result.output
@@
         result = runner.invoke(app, ["proof", "status", "-w", str(workspace_path)])
         assert result.exit_code == 0, result.output
         assert "Expired" in result.output
-        assert "Open:" in result.output
+        assert "Open: 1" in result.output

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/cli/test_proof_commands.py` around lines 243 - 278, The tests use loose
substring checks (e.g., assert "1" in result.output) which can match unrelated
digits; update the three tests (test_status_shows_open_count,
test_status_shows_waived_count, test_status_expired_waiver_reverts_to_open) to
assert the exact summary text instead of generic digits — for example assert
"Open: 1" in result.output and assert "Waived: 1" in result.output (or use an
anchored regex like r"^Open:\s*1" / r"^Waived:\s*1" against the multiline
output) so the status lines are validated precisely and the expired-waiver test
verifies the requirement is counted as Open again.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/cli/test_proof_commands.py`:
- Around line 305-308: The test currently checks only that "Open" appears in the
proof status output; replace that weak assertion with a precise check that the
failing run left REQ-0001 open by asserting the count (e.g., assert "Open: 1" in
result.output) or by querying the ledger after runner.invoke (use the same
ledger client/helper used elsewhere in tests) to fetch the requirement status
for REQ-0001 and assert it remains "Open". Update the assertion near the proof
status invocation (runner.invoke(app, ["proof", "status", "-w",
str(workspace_path)])) to verify the exact open count or ledger state so the
"fail doesn't auto-satisfy" path is actually validated.

---

Duplicate comments:
In `@tests/cli/test_proof_commands.py`:
- Around line 243-278: The tests use loose substring checks (e.g., assert "1" in
result.output) which can match unrelated digits; update the three tests
(test_status_shows_open_count, test_status_shows_waived_count,
test_status_expired_waiver_reverts_to_open) to assert the exact summary text
instead of generic digits — for example assert "Open: 1" in result.output and
assert "Waived: 1" in result.output (or use an anchored regex like r"^Open:\s*1"
/ r"^Waived:\s*1" against the multiline output) so the status lines are
validated precisely and the expired-waiver test verifies the requirement is
counted as Open again.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: ccb268b7-3683-4260-b68a-9368cb1af409

📥 Commits

Reviewing files that changed from the base of the PR and between 0fa6b88 and a23ef3a.

📒 Files selected for processing (1)

tests/cli/test_proof_commands.py

coderabbitai · 2026-03-20T20:14:49Z

tests/cli/test_proof_commands.py

+        # Step 3 — status: still open (run failure doesn't auto-satisfy)
+        result = runner.invoke(app, ["proof", "status", "-w", str(workspace_path)])
+        assert result.exit_code == 0, result.output
+        assert "Open" in result.output


⚠️ Potential issue | 🟠 Major

Assert the failed run leaves REQ-0001 open.

"Open" in the status output doesn't prove the failing run preserved the requirement state. Query the ledger here, or at least assert Open: 1, so the “fail doesn't auto-satisfy” path is actually covered.

Suggested assertion

result = runner.invoke(app, ["proof", "status", "-w", str(workspace_path)]) assert result.exit_code == 0, result.output - assert "Open" in result.output + assert "Open: 1" in result.output + req = ledger.get_requirement(workspace, "REQ-0001") + assert req is not None + assert req.status == ReqStatus.OPEN

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# Step 3 — status: still open (run failure doesn't auto-satisfy)

result = runner.invoke(app, ["proof", "status", "-w", str(workspace_path)])

assert result.exit_code == 0, result.output

assert "Open" in result.output

# Step 3 — status: still open (run failure doesn't auto-satisfy)

result = runner.invoke(app, ["proof", "status", "-w", str(workspace_path)])

assert result.exit_code == 0, result.output

assert "Open: 1" in result.output

req = ledger.get_requirement(workspace, "REQ-0001")

assert req is not None

assert req.status == ReqStatus.OPEN

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/cli/test_proof_commands.py` around lines 305 - 308, The test currently checks only that "Open" appears in the proof status output; replace that weak assertion with a precise check that the failing run left REQ-0001 open by asserting the count (e.g., assert "Open: 1" in result.output) or by querying the ledger after runner.invoke (use the same ledger client/helper used elsewhere in tests) to fetch the requirement status for REQ-0001 and assert it remains "Open". Update the assertion near the proof status invocation (runner.invoke(app, ["proof", "status", "-w", str(workspace_path)])) to verify the exact open count or ledger state so the "fail doesn't auto-satisfy" path is actually validated.

coderabbitai bot reviewed Mar 20, 2026

View reviewed changes

tests/cli/test_proof_commands.py Show resolved Hide resolved

tests/cli/test_proof_commands.py Show resolved Hide resolved

tests/cli/test_proof_commands.py Show resolved Hide resolved

coderabbitai bot reviewed Mar 20, 2026

View reviewed changes

frankbria merged commit f5a33e8 into main Mar 20, 2026
13 checks passed

frankbria deleted the feat/proof-cli-integration-tests branch March 20, 2026 20:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(cli): integration tests for all cf proof commands (#455)#462

test(cli): integration tests for all cf proof commands (#455)#462
frankbria merged 2 commits intomainfrom
feat/proof-cli-integration-tests

frankbria commented Mar 20, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Mar 20, 2026 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

claude bot commented Mar 20, 2026 •

edited

Loading

Uh oh!

claude bot commented Mar 20, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Mar 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

frankbria commented Mar 20, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Acceptance criteria

What's in the test file

Design decisions

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

claude bot commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review: Integration Tests for cf proof Commands

Issues

Minor Notes

What's Good

Uh oh!

claude bot commented Mar 20, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

frankbria commented Mar 20, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 20, 2026 •

edited

Loading

claude bot commented Mar 20, 2026 •

edited

Loading

Review: Integration Tests for `cf proof` Commands