feat: FillRecord source-of-truth enrichment with provenance and per-turn detail#161
feat: FillRecord source-of-truth enrichment with provenance and per-turn detail#161
Conversation
Plan covers three complementary changes: - A: Drop fully-empty table rows on normalization - B: Warn on mostly-empty rows during validation - D: Strengthen minRows/maxRows to count only substantive rows Includes TDD testing strategy and phased implementation plan. https://claude.ai/code/session_01QysuApEBfeCULNi5aRTzDw
- Phase 1 is now documentation updates (spec and reference docs) - Phase 2 (empty row dropping) has exact line numbers and code diffs for parseTable.ts (lines 298-314, 330-347) and apply.ts (lines 635-670) - Phase 3 (mostly-empty warnings) has exact diff for validate.ts (lines 962-976) - Phase 4 is verification against example forms - References section expanded with precise line numbers for all source locations https://claude.ai/code/session_01QysuApEBfeCULNi5aRTzDw
- Design helper signature now matches implementation (no columns param) - Threshold consistently says "strictly more than half" (not ">= 50%") - Fix test case: 2/4 filled is even split → no warning (was incorrectly → warning) - Add test case: 1/3 filled → warning (odd column count boundary) - Fix Background apply.ts line range to 635-682 (was 653-681) - Remove coreTypes.ts from Components table (no changes needed there) https://claude.ai/code/session_01QysuApEBfeCULNi5aRTzDw
Epic mf-ds98 with 5 implementation beads: - mf-osok: Phase 1 — Documentation updates - mf-2h7f: Phase 2a — isRowFullyEmpty() + parseTable filtering - mf-s6wk: Phase 2b — apply.ts patch handler filtering - mf-iyg4: Phase 3 — Mostly-empty row warnings - mf-xg2p: Phase 4 — Verification and regression testing Dependencies: 1 → 2a,2b → 3 → 4 https://claude.ai/code/session_01QysuApEBfeCULNi5aRTzDw
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 12dbe3ea43
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| !cell || cell.state === 'skipped' || cell.state === 'aborted' | ||
| || cell.value === undefined || cell.value === null || cell.value === '', |
There was a problem hiding this comment.
Preserve aborted table rows when filtering empty rows
This helper treats cell.state === 'aborted' as "empty," so a row containing only %ABORT%/aborted cells would be dropped during normalization. If implemented as written, that silently discards explicit abort signals (and any reasons) that are currently representable in table cells, and can change the field to effectively unanswered even though the agent explicitly aborted those cells.
Useful? React with 👍 / 👎.
Coverage Report for packages/markform
File Coverage
|
||||||||||||||||||||||||||||||||||||||||||||||||||
- Add isRowFullyEmpty() helper to detect rows where all cells are skipped/empty/aborted - Filter empty rows during parse (parseMarkdownTable, parseInlineTable) and patch apply (set_table, append_table) - Add mostly-empty row warning when >50% of cells in a non-empty row are empty - Move minRows check before isEmpty early return so minRows>0 fails on empty tables - Update markform-spec.md and markform-reference.md with empty row handling and sparseness warning docs - Full TDD coverage: 19 new tests across parseTable, apply, and validate test suites Co-Authored-By: Claude <noreply@anthropic.com> https://claude.ai/code/session_01QysuApEBfeCULNi5aRTzDw
Co-Authored-By: Claude <noreply@anthropic.com> https://claude.ai/code/session_01QysuApEBfeCULNi5aRTzDw
- Restructure isRowFullyEmpty() to separate non-answered vs answered-but-empty checks for clarity - Consolidate redundant isRowFullyEmpty unit tests into table-driven format (7 → 6 tests, same coverage) - Remove redundant 'does not warn when 3 of 4 cells filled' test (boundary already tested at 2 of 4) - Remove comments that restate obvious test behavior Co-Authored-By: Claude <noreply@anthropic.com> https://claude.ai/code/session_01QysuApEBfeCULNi5aRTzDw
Co-Authored-By: Claude <noreply@anthropic.com> https://claude.ai/code/session_01QysuApEBfeCULNi5aRTzDw
Co-Authored-By: Cursor <noreply@cursor.com> Co-authored-by: Cursor <cursoragent@cursor.com>
Review fixes for PR #161: - isRowFullyEmpty no longer treats aborted cells as empty (aborted carries intentional signal from the agent) - Add round-trip serialization test for empty row dropping - Add progress computation tests for tables with empty rows - Simplify mostly-empty row warning message (remove prescriptive advice) - Fix spec numbering gap (A, B, D -> A, B, C) - Update spec to reflect aborted-row fix and status Co-Authored-By: Cursor <noreply@cursor.com> Co-authored-by: Cursor <cursoragent@cursor.com>
Summary
Implements the FillRecord source-of-truth enrichment spec (plan-2026-02-21-fill-record-comprehensive-source-of-truth.md) to make FillRecord self-contained and sufficient for post-hoc debugging without re-running fills. Adds provenance tracking (markform version, input hash, schema version), effective config snapshot, and per-turn enrichment (form progress, rejection details, issue refs, opt-in raw patches).
This transforms FillRecord from a summarization layer (counts, timing, aggregates) to a comprehensive source of record that captures everything needed to understand and reproduce a fill operation.
Changes
Provenance Tracking (FR-6):
markformVersion(from build-time VERSION constant)inputFormSha256(SHA-256 of input form before filling)fillRecordSchemaVersion: 1(schema version for forward compatibility)src/version.tsto avoid import cyclesEffective Config Snapshot (FR-1):
config: FillConfigSnapshotfield to capture resolved FillOptionsFillConfigSnapshottype usingOmit<FillOptions, ...>exclude-list patternFillConfigSchemawith.passthrough()for forward compatibilityprefillFieldIdswhen inputContext providedPer-Turn Enrichment (FR-2, FR-3, FR-5):
rejectedPatches: PatchRejection[]to timeline entries (full detail vs count-only)formProgressper-turn snapshots (answeredFields, skippedFields, requiredRemaining, optionalRemaining)issueRefscompact issue references (ref, scope, severity, reason)TurnProgresswithformProgressSnapshotfieldfillRecordCollectorandprogrammaticFillOpt-In Raw Patches (FR-4):
recordPatches: booleanto FillOptions (defaults to false due to size/PII concerns)--record-patchesCLI flagPatch[]in timeline entries when enabledImplementation Details:
inputFormSha256before inputContext is applied (stable across different prefills)patchesAppliedcount in parallel path (use actual applied count, not submitted count)Tests:
Dependencies:
Closed Beads:
Test Plan
Automated Tests:
Manual Testing Scenarios:
markform fillwith--record-fillon a complex form, verify.fill.jsoncontains:markformVersion(non-empty string)inputFormSha256(64-char hex string)fillRecordSchemaVersion: 1configsection with resolved defaults (e.g.,maxTurnsTotal: 100, not undefined)config.prefillFieldIdswhen inputContext usedmarkform fillwith--record-patches, verify timeline entries containpatchesarraysmarkform fillwithout--record-patches, verify timeline entries omitpatchesformProgresson every entry (4 numeric fields)issueRefson entries with issues (compact ref/scope/severity/reason)rejectedPatcheson entries with rejections (full detail, not just counts)Edge Cases:
Backward Compatibility:
patchesRejectedandissuesAddressedcounts for backward compatPerformance & Size Impact:
Documentation:
Related Beads
Implements spec:
docs/project/specs/active/plan-2026-02-21-fill-record-comprehensive-source-of-truth.mdCloses beads from previous work:
🤖 Generated with Claude Code