Skip to content

test(cli): integration tests for all cf proof commands (#455)#462

Merged
frankbria merged 2 commits intomainfrom
feat/proof-cli-integration-tests
Mar 20, 2026
Merged

test(cli): integration tests for all cf proof commands (#455)#462
frankbria merged 2 commits intomainfrom
feat/proof-cli-integration-tests

Conversation

@frankbria
Copy link
Owner

@frankbria frankbria commented Mar 20, 2026

Summary

Closes #455. Adds tests/cli/test_proof_commands.py — 17 integration tests covering the complete cf proof CLI surface area through the Typer CliRunner against real SQLite workspaces.

Acceptance criteria

  • CLI integration test: cf proof capture creates a REQ and persists it
  • CLI integration test: cf proof run evaluates workspace against open REQs
  • CLI integration test: cf proof waive marks a REQ waived with expiry
  • CLI integration test: cf proof status shows correct summary
  • Closed loop: capture → run (fail) → run (pass) → evidence recorded

What's in the test file

Class Tests Coverage
TestCapture 4 Creates REQ, increments IDs, rejects invalid severity/source
TestRun 4 PASS/FAIL exits, no-obligations, invalid gate — _run_gate patched
TestWaive 4 With/without expiry, missing REQ exit 1, bad date format
TestStatus 4 Empty workspace, open count, waived count, expired waiver reverts
TestClosedLoop 1 Full capture → fail run → pass run → evidence recorded

Design decisions

  • _run_gate is patched at codeframe.core.proof.runner._run_gate — same pattern as TestRunner in tests/core/test_proof9.py
  • Persistence verified directly via ledger.get_requirement / ledger.list_evidence — output checks alone are insufficient
  • Each test class gets an isolated tmp_path workspace via create_or_load_workspace
  • No production code changes required — all commands were already correctly implemented

Test plan

  • 17/17 tests pass locally
  • uv run ruff check clean
  • 59/59 (proof CLI + proof unit tests) pass together

Summary by CodeRabbit

  • Tests
    • Added comprehensive integration tests for the proof CLI covering capture, run, waive, and status flows. Verifies requirement ID generation and display, run outcomes (pass/fail/none found), waiver persistence and expiry handling, error messages for invalid input, and end-to-end status/evidence transitions across failing → passing runs.

17 tests across 5 test classes cover every acceptance criterion:

  AC1 — TestCapture: creates REQ-0001, persists to ledger, increments IDs,
         rejects invalid severity/source with exit code 1
  AC2 — TestRun: PASS/FAIL exit codes, no-obligations empty path,
         invalid gate error, _run_gate patched via unittest.mock
  AC3 — TestWaive: marks waived with expiry and persists to ledger,
         works without expiry, rejects missing REQ (exit 1),
         rejects bad date format with descriptive error
  AC4 — TestStatus: empty workspace message, open count, waived count,
         expired waiver reverts to open and prints Expired notice
  AC5 — TestClosedLoop: capture → fail run (exit 1, FAIL in output) →
         status still open → pass run (exit 0, PASS) → evidence recorded

All 17 tests pass. No production code changes required.
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 20, 2026

Walkthrough

Adds a new end-to-end CLI test module that exercises cf proof capture, run, waive, and status commands against a temporary SQLite workspace, validating requirement creation, persistence, gate outcomes, waiver handling (including expiry), and a closed-loop capture→run→status flow.

Changes

Cohort / File(s) Summary
Proof CLI Integration Tests
tests/cli/test_proof_commands.py
New ~323-line integration test file exercising codeframe.cli.app via CliRunner. Covers proof capture (REQ ID incrementing, persistence), proof run (pass/fail, empty workspace, unknown gate), proof waive (waiver state, expiry parsing, error cases), and proof status (summaries, expiry handling, closed-loop evidence checks).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Poem

🐰 I hopped through tests with a twitch of my nose,

REQ IDs grew like carrots row by row,
Runs that pass and runs that fail, I scurried to see,
Waivers that bloom and expiries set free,
A happy little ledger — proof complete for me.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: adding integration tests for all cf proof commands, which aligns perfectly with the changeset.
Linked Issues check ✅ Passed The PR implements all acceptance criteria from issue #455: capture test, run test, waive test, status test, and closed-loop test all present and functional.
Out of Scope Changes check ✅ Passed The PR contains only integration tests in a new test file with no production code changes or modifications outside the test scope.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/proof-cli-integration-tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
tests/cli/test_proof_commands.py (1)

51-55: Assert the seed capture in ws_with_req.

If the setup command fails here, the dependent tests fail later with misleading assertions. Failing fast in the fixture makes the regression obvious.

Example tweak
 `@pytest.fixture`()
 def ws_with_req(ws):
     """Workspace that already has one captured requirement (REQ-0001)."""
     workspace, workspace_path = ws
-    runner.invoke(app, ["proof", "capture", "-w", str(workspace_path)] + _CAPTURE_ARGS)
+    result = runner.invoke(
+        app, ["proof", "capture", "-w", str(workspace_path)] + _CAPTURE_ARGS
+    )
+    assert result.exit_code == 0, result.output
     return workspace, workspace_path
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/cli/test_proof_commands.py` around lines 51 - 55, The ws_with_req
fixture runs the seed capture but doesn't verify it succeeded; update the
ws_with_req function to assert the capture command's result is successful by
capturing the return value of runner.invoke(app, ["proof", "capture", "-w",
str(workspace_path)] + _CAPTURE_ARGS) and asserting result.exit_code == 0
(and/or that result.output contains the expected success message or created
requirement id like "REQ-0001") so the fixture fails fast when the setup command
fails.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/cli/test_proof_commands.py`:
- Around line 205-214: Update the test_waive_nonexistent_req_exits_nonzero test
to assert the CLI error message contains a stable "not found" indicator in
addition to exit code; after calling runner.invoke(app, [...]) check
result.exit_code == 1 and add an assertion like "'not found' in result.output"
(or result.stdout/result.stderr depending on runner) to ensure the failure is
the missing-REQ case rather than an unrelated usage/CLI error.
- Around line 305-315: After verifying evidence via ledger.list_evidence for
"REQ-0001", also assert the canonical satisfied state: call
ledger.get_requirement(workspace, "REQ-0001") and assert its .status (or .state)
equals the satisfied value your model uses (e.g., "satisfied" or
RequirementStatus.SATISFIED), or alternatively invoke the CLI with
runner.invoke(app, ["proof", "status", "-w", str(workspace_path), "REQ-0001"])
and assert the output shows it as satisfied; attach this assertion after the
existing evidence checks to ensure the requirement is truly recorded as closed.
- Around line 241-273: The tests use loose substring checks like assert "1" in
result.output which can match unrelated digits; tighten assertions by checking
the exact summary lines and verifying the waive CLI succeeded: in
test_status_shows_open_count assert the status output contains a line like
"Open: 1" (or use a regex anchored to the summary label) instead of just "1"; in
test_status_shows_waived_count first capture the result of runner.invoke(app,
["proof", "waive", ...]) and assert its exit_code == 0, then assert the status
contains an exact "Waived: 1" line; in
test_status_expired_waiver_reverts_to_open after injecting the past Waiver via
ledger.waive_requirement confirm the status contains both the "Expired" notice
and that the summary shows "Open: 1" (not just that "Expired" appears). Use the
test function names (test_status_shows_open_count,
test_status_shows_waived_count, test_status_expired_waiver_reverts_to_open),
runner.invoke, ledger.waive_requirement and the Waiver model to locate the
changes.

---

Nitpick comments:
In `@tests/cli/test_proof_commands.py`:
- Around line 51-55: The ws_with_req fixture runs the seed capture but doesn't
verify it succeeded; update the ws_with_req function to assert the capture
command's result is successful by capturing the return value of
runner.invoke(app, ["proof", "capture", "-w", str(workspace_path)] +
_CAPTURE_ARGS) and asserting result.exit_code == 0 (and/or that result.output
contains the expected success message or created requirement id like "REQ-0001")
so the fixture fails fast when the setup command fails.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 85023c51-de0a-49d1-be45-8327a263963b

📥 Commits

Reviewing files that changed from the base of the PR and between 936d588 and 0fa6b88.

📒 Files selected for processing (1)
  • tests/cli/test_proof_commands.py

@claude
Copy link

claude bot commented Mar 20, 2026

Review: Integration Tests for cf proof Commands

Good addition — 17 tests covering the full CLI surface with isolated workspaces and direct ledger verification. The structure is clean and the AC mapping is clear. A few things worth addressing:

Issues

1. Fragile string assertions (TestStatus)

test_status_shows_open_count and test_status_shows_waived_count both check that "1" appears in output. This can false-positive on "REQ-0001" or any other digit occurrence. Use a more specific match:

assert "Open: 1" in result.output
assert "Waived: 1" in result.output

Check the exact output format from cf proof status and match it precisely.

2. Late import inside test method

test_status_expired_waiver_reverts_to_open imports Waiver inside the function body. Move it to the top-level imports alongside ReqStatus.

3. ws_with_req fixture does not validate the setup invoke

If cf proof capture fails silently in the fixture, downstream tests will get confusing assertion errors. Add an assertion guard after the invoke.

4. test_waive_nonexistent_req_exits_nonzero skips output check

All other error tests verify the error message. This one only checks exit code. Add an output assertion too.

Minor Notes

  • Prefer == 1 over != 0 for exit code assertions for precision.
  • test_run_no_requirements_exits_zero runs without patching _run_gate. Worth a brief inline comment.

What's Good

  • Direct ledger verification (not just output) is the right call — the comment about it is a good explanation.
  • _run_gate patched at the correct module path, matching the pattern in test_proof9.py.
  • Fixture composition (ws_with_req building on ws) is clean.
  • The closed-loop test (AC5) is the most valuable test here — captures the full flow in one place.
  • pytestmark = pytest.mark.v2 follows repo convention.

- ws_with_req fixture: assert capture exit_code == 0 for fast failure on setup errors
- test_waive_nonexistent_req: add "not found" message assertion alongside exit code check
- test_status_shows_open_count/waived_count: tighten to "Open:"/"Waived:" substrings and verify waive exit_code == 0 before status check
- test_status_expired_waiver_reverts_to_open: assert "Open:" present after expiry in addition to "Expired"
- test_capture_run_enforced_then_satisfied: assert req.status == ReqStatus.SATISFIED after passing run
@claude
Copy link

claude bot commented Mar 20, 2026

Follow-up Review - status of issues after fix commit a23ef3a

Resolved items from prior round:

  1. ws_with_req fixture guard - now correctly asserts exit_code == 0 with a descriptive failure message. Fixed.

  2. test_waive_nonexistent_req_exits_nonzero - now checks both exit code and output message. Fixed.

  3. Closed-loop test final assertion - now calls ledger.get_requirement and asserts ReqStatus.SATISFIED. Fixed.

Still open:

Issue 1 - Important - Fragile count assertions in TestStatus (lines 243, 248, 262, 266). The two-assertion pattern still exists: assert 'Open:' in result.output followed by assert '1' in result.output. The digit '1' appears in REQ-0001 and in tmp path names, so this can pass even when the count label is absent. The fix is to combine into a single specific string, for example: assert 'Open: 1' in result.output. Check the actual output format from cf proof status to confirm the exact separator before applying.

Issue 2 - Minor - Late import inside test body. The import 'from codeframe.core.proof.models import Waiver' remains inside the test_status_expired_waiver_reverts_to_open function body. Move it to the module-level imports at the top of the file alongside the existing ReqStatus import for consistency.

Issue 3 - Minor - Inconsistent exit code assertions. test_capture_invalid_severity_exits_nonzero and test_capture_invalid_source_exits_nonzero use assert result.exit_code != 0. All other error tests in this file use == 1. Prefer == 1 for precision unless the command can return multiple distinct non-zero values on error.

Summary: The three critical issues from the first review round are all resolved. Issue 1 above is the highest priority remaining item since it creates a false-passing test condition. Issues 2 and 3 are cleanup items. The overall test design is solid - isolated workspaces, direct ledger verification, correct patch target for _run_gate, and good fixture composition.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
tests/cli/test_proof_commands.py (1)

243-278: ⚠️ Potential issue | 🟠 Major

Tighten the status summary assertions.

These checks can still pass on unrelated digits like REQ-0001, and the expired-waiver case still doesn't prove the requirement is counted as open again. Please assert the summary lines directly (Open: 1, Waived: 1, and Open: 1 after expiry), or use an anchored regex against those labels.

Suggested tightening
         result = runner.invoke(app, ["proof", "status", "-w", str(workspace_path)])
         assert result.exit_code == 0, result.output
-        assert "Open:" in result.output
-        assert "1" in result.output
+        assert "Open: 1" in result.output
@@
         result = runner.invoke(app, ["proof", "status", "-w", str(workspace_path)])
         assert result.exit_code == 0, result.output
-        assert "Waived:" in result.output
-        assert "1" in result.output
+        assert "Waived: 1" in result.output
@@
         result = runner.invoke(app, ["proof", "status", "-w", str(workspace_path)])
         assert result.exit_code == 0, result.output
         assert "Expired" in result.output
-        assert "Open:" in result.output
+        assert "Open: 1" in result.output
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/cli/test_proof_commands.py` around lines 243 - 278, The tests use loose
substring checks (e.g., assert "1" in result.output) which can match unrelated
digits; update the three tests (test_status_shows_open_count,
test_status_shows_waived_count, test_status_expired_waiver_reverts_to_open) to
assert the exact summary text instead of generic digits — for example assert
"Open: 1" in result.output and assert "Waived: 1" in result.output (or use an
anchored regex like r"^Open:\s*1" / r"^Waived:\s*1" against the multiline
output) so the status lines are validated precisely and the expired-waiver test
verifies the requirement is counted as Open again.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/cli/test_proof_commands.py`:
- Around line 305-308: The test currently checks only that "Open" appears in the
proof status output; replace that weak assertion with a precise check that the
failing run left REQ-0001 open by asserting the count (e.g., assert "Open: 1" in
result.output) or by querying the ledger after runner.invoke (use the same
ledger client/helper used elsewhere in tests) to fetch the requirement status
for REQ-0001 and assert it remains "Open". Update the assertion near the proof
status invocation (runner.invoke(app, ["proof", "status", "-w",
str(workspace_path)])) to verify the exact open count or ledger state so the
"fail doesn't auto-satisfy" path is actually validated.

---

Duplicate comments:
In `@tests/cli/test_proof_commands.py`:
- Around line 243-278: The tests use loose substring checks (e.g., assert "1" in
result.output) which can match unrelated digits; update the three tests
(test_status_shows_open_count, test_status_shows_waived_count,
test_status_expired_waiver_reverts_to_open) to assert the exact summary text
instead of generic digits — for example assert "Open: 1" in result.output and
assert "Waived: 1" in result.output (or use an anchored regex like r"^Open:\s*1"
/ r"^Waived:\s*1" against the multiline output) so the status lines are
validated precisely and the expired-waiver test verifies the requirement is
counted as Open again.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: ccb268b7-3683-4260-b68a-9368cb1af409

📥 Commits

Reviewing files that changed from the base of the PR and between 0fa6b88 and a23ef3a.

📒 Files selected for processing (1)
  • tests/cli/test_proof_commands.py

Comment on lines +305 to +308
# Step 3 — status: still open (run failure doesn't auto-satisfy)
result = runner.invoke(app, ["proof", "status", "-w", str(workspace_path)])
assert result.exit_code == 0, result.output
assert "Open" in result.output
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Assert the failed run leaves REQ-0001 open.

"Open" in the status output doesn't prove the failing run preserved the requirement state. Query the ledger here, or at least assert Open: 1, so the “fail doesn't auto-satisfy” path is actually covered.

Suggested assertion
         result = runner.invoke(app, ["proof", "status", "-w", str(workspace_path)])
         assert result.exit_code == 0, result.output
-        assert "Open" in result.output
+        assert "Open: 1" in result.output
+        req = ledger.get_requirement(workspace, "REQ-0001")
+        assert req is not None
+        assert req.status == ReqStatus.OPEN
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Step 3 — status: still open (run failure doesn't auto-satisfy)
result = runner.invoke(app, ["proof", "status", "-w", str(workspace_path)])
assert result.exit_code == 0, result.output
assert "Open" in result.output
# Step 3 — status: still open (run failure doesn't auto-satisfy)
result = runner.invoke(app, ["proof", "status", "-w", str(workspace_path)])
assert result.exit_code == 0, result.output
assert "Open: 1" in result.output
req = ledger.get_requirement(workspace, "REQ-0001")
assert req is not None
assert req.status == ReqStatus.OPEN
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/cli/test_proof_commands.py` around lines 305 - 308, The test currently
checks only that "Open" appears in the proof status output; replace that weak
assertion with a precise check that the failing run left REQ-0001 open by
asserting the count (e.g., assert "Open: 1" in result.output) or by querying the
ledger after runner.invoke (use the same ledger client/helper used elsewhere in
tests) to fetch the requirement status for REQ-0001 and assert it remains
"Open". Update the assertion near the proof status invocation
(runner.invoke(app, ["proof", "status", "-w", str(workspace_path)])) to verify
the exact open count or ledger state so the "fail doesn't auto-satisfy" path is
actually validated.

@frankbria frankbria merged commit f5a33e8 into main Mar 20, 2026
13 checks passed
@frankbria frankbria deleted the feat/proof-cli-integration-tests branch March 20, 2026 20:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Proof commands need end-to-end integration tests

1 participant