feat: FillRecord source-of-truth enrichment with provenance and per-turn detail by jlevy · Pull Request #161 · jlevy/markform

jlevy · 2026-02-20T17:04:39Z

Summary

Implements the FillRecord source-of-truth enrichment spec (plan-2026-02-21-fill-record-comprehensive-source-of-truth.md) to make FillRecord self-contained and sufficient for post-hoc debugging without re-running fills. Adds provenance tracking (markform version, input hash, schema version), effective config snapshot, and per-turn enrichment (form progress, rejection details, issue refs, opt-in raw patches).

This transforms FillRecord from a summarization layer (counts, timing, aggregates) to a comprehensive source of record that captures everything needed to understand and reproduce a fill operation.

Changes

Provenance Tracking (FR-6):

Add markformVersion (from build-time VERSION constant)
Add inputFormSha256 (SHA-256 of input form before filling)
Add fillRecordSchemaVersion: 1 (schema version for forward compatibility)
Extract VERSION to src/version.ts to avoid import cycles

Effective Config Snapshot (FR-1):

Add config: FillConfigSnapshot field to capture resolved FillOptions
Define FillConfigSnapshot type using Omit<FillOptions, ...> exclude-list pattern
New options automatically captured unless explicitly excluded
Add FillConfigSchema with .passthrough() for forward compatibility
Compute effective config after default resolution (shows what actually ran, not undefined)
Include prefillFieldIds when inputContext provided

Per-Turn Enrichment (FR-2, FR-3, FR-5):

Add rejectedPatches: PatchRejection[] to timeline entries (full detail vs count-only)
Add formProgress per-turn snapshots (answeredFields, skippedFields, requiredRemaining, optionalRemaining)
Add issueRefs compact issue references (ref, scope, severity, reason)
Extend TurnProgress with formProgressSnapshot field
Wire enrichment through fillRecordCollector and programmaticFill

Opt-In Raw Patches (FR-4):

Add recordPatches: boolean to FillOptions (defaults to false due to size/PII concerns)
Add --record-patches CLI flag
Store raw Patch[] in timeline entries when enabled
Document PII implications in code comments

Implementation Details:

Compute inputFormSha256 before inputContext is applied (stable across different prefills)
Fix patchesApplied count in parallel path (use actual applied count, not submitted count)
Capture pre-apply issues in CLI (what LLM saw before applying patches)
Change sessionId schema from UUID to string (support sess-ULID format)

Tests:

Add unit tests for provenance fields (markformVersion, inputFormSha256, fillRecordSchemaVersion)
Add unit tests for config snapshot with resolved defaults
Add unit tests for per-turn enrichment wiring
Add unit tests for recordPatches opt-in/out behavior
Add integration tests for FillRecord schema validation
Add integration tests for formProgress snapshots
All 2292 tests pass with full coverage of new features

Dependencies:

Upgrade tryscript from 0.1.6 to 0.1.7

Closed Beads:

Cleaned up 7 closed issue files from previous work (table validation epic)

Test Plan

Automated Tests:

All unit tests pass (2292 passed, 1 skipped)
Typecheck passes with no errors
All new schemas validate with Zod
FillConfigSchema.passthrough() preserves unknown keys
FillRecordSchema validates real fill records
formProgress snapshots computed correctly from ProgressCounts
recordPatches opt-in/out behavior works
inputFormSha256 stable across different inputContext values

Manual Testing Scenarios:

Run markform fill with --record-fill on a complex form, verify .fill.json contains:
- markformVersion (non-empty string)
- inputFormSha256 (64-char hex string)
- fillRecordSchemaVersion: 1
- config section with resolved defaults (e.g., maxTurnsTotal: 100, not undefined)
- config.prefillFieldIds when inputContext used
Run markform fill with --record-patches, verify timeline entries contain patches arrays
Run markform fill without --record-patches, verify timeline entries omit patches
Inspect timeline entries for:
- formProgress on every entry (4 numeric fields)
- issueRefs on entries with issues (compact ref/scope/severity/reason)
- rejectedPatches on entries with rejections (full detail, not just counts)

Edge Cases:

Config snapshot excludes non-serializable fields (form, model, signal, callbacks, providers)
Config snapshot includes all serializable FillOptions fields
inputFormSha256 computed before inputContext applied (template hash, not filled form)
formProgress derivation handles all ProgressCounts edge cases
Empty arrays omitted from timeline entries (compact JSON)
sessionId accepts sess-ULID format (not just UUID)

Backward Compatibility:

All new FillRecord fields are optional (existing records parse unchanged)
Retained patchesRejected and issuesAddressed counts for backward compat
FillConfigSchema.passthrough() allows future FillOptions fields

Performance & Size Impact:

Config snapshot: ~200-500 bytes (always on, acceptable)
Per-turn enrichment: ~5-15KB for 30-turn fill (~3-10% increase, acceptable)
Raw patches (opt-in): significant size impact only when enabled
Per-turn progress computation negligible overhead (4 numeric counters)

Documentation:

Spec marked as "Implemented"
Code comments document PII implications of --record-patches
FillConfigSnapshot type clearly documents excluded fields

Related Beads

Implements spec: docs/project/specs/active/plan-2026-02-21-fill-record-comprehensive-source-of-truth.md

Closes beads from previous work:

Table row validation epic (all phases complete)
FillRecord implementation beads (all phases complete)

🤖 Generated with Claude Code

Plan covers three complementary changes: - A: Drop fully-empty table rows on normalization - B: Warn on mostly-empty rows during validation - D: Strengthen minRows/maxRows to count only substantive rows Includes TDD testing strategy and phased implementation plan. https://claude.ai/code/session_01QysuApEBfeCULNi5aRTzDw

- Phase 1 is now documentation updates (spec and reference docs) - Phase 2 (empty row dropping) has exact line numbers and code diffs for parseTable.ts (lines 298-314, 330-347) and apply.ts (lines 635-670) - Phase 3 (mostly-empty warnings) has exact diff for validate.ts (lines 962-976) - Phase 4 is verification against example forms - References section expanded with precise line numbers for all source locations https://claude.ai/code/session_01QysuApEBfeCULNi5aRTzDw

- Design helper signature now matches implementation (no columns param) - Threshold consistently says "strictly more than half" (not ">= 50%") - Fix test case: 2/4 filled is even split → no warning (was incorrectly → warning) - Add test case: 1/3 filled → warning (odd column count boundary) - Fix Background apply.ts line range to 635-682 (was 653-681) - Remove coreTypes.ts from Components table (no changes needed there) https://claude.ai/code/session_01QysuApEBfeCULNi5aRTzDw

Epic mf-ds98 with 5 implementation beads: - mf-osok: Phase 1 — Documentation updates - mf-2h7f: Phase 2a — isRowFullyEmpty() + parseTable filtering - mf-s6wk: Phase 2b — apply.ts patch handler filtering - mf-iyg4: Phase 3 — Mostly-empty row warnings - mf-xg2p: Phase 4 — Verification and regression testing Dependencies: 1 → 2a,2b → 3 → 4 https://claude.ai/code/session_01QysuApEBfeCULNi5aRTzDw

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 12dbe3ea43

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-02-20T17:08:00Z

docs/project/specs/active/plan-2026-02-20-validate-form-rows.md

+    !cell || cell.state === 'skipped' || cell.state === 'aborted'
+    || cell.value === undefined || cell.value === null || cell.value === '',


Preserve aborted table rows when filtering empty rows

This helper treats cell.state === 'aborted' as "empty," so a row containing only %ABORT%/aborted cells would be dropped during normalization. If implemented as written, that silently discards explicit abort signals (and any reasons) that are currently representable in table cells, and can change the field to effectively unanswered even though the agent explicitly aborted those cells.

Useful? React with 👍 / 👎.

github-actions · 2026-02-20T17:08:15Z

Coverage Report for packages/markform

Status	Category	Percentage	Covered / Total
🔵	Lines	66.2% (🎯 64%)	6116 / 9238
🔵	Statements	65.99% (🎯 64%)	6334 / 9598
🔵	Functions	65.58% (🎯 64%)	747 / 1139
🔵	Branches	62.63% (🎯 60%)	4419 / 7055

File Coverage

File	Stmts	Branches	Functions	Lines	Uncovered Lines
Changed Files
packages/markform/src/engine/apply.ts	99.64%	93.06%	100%	99.62%	234
packages/markform/src/engine/validate.ts	91.92%	89.44%	100%	91.78%	127-132, 248, 253-258, 263-268, 317-322, 379-384, 609, 626, 758-763, 770-776, 781, 788-793, 805-810, 825-830, 842-847, 870-875, 880-887, 891-898, 936-941, 1086-1087, 1106-1107, 1161-1167
packages/markform/src/engine/table/parseTable.ts	87.01%	82.07%	93.54%	86.31%	168-212, 251, 266, 313, 496

Generated in workflow #1002 for commit 1c9fdc1 by the Vitest Coverage Report Action

- Add isRowFullyEmpty() helper to detect rows where all cells are skipped/empty/aborted - Filter empty rows during parse (parseMarkdownTable, parseInlineTable) and patch apply (set_table, append_table) - Add mostly-empty row warning when >50% of cells in a non-empty row are empty - Move minRows check before isEmpty early return so minRows>0 fails on empty tables - Update markform-spec.md and markform-reference.md with empty row handling and sparseness warning docs - Full TDD coverage: 19 new tests across parseTable, apply, and validate test suites Co-Authored-By: Claude <noreply@anthropic.com> https://claude.ai/code/session_01QysuApEBfeCULNi5aRTzDw

Co-Authored-By: Claude <noreply@anthropic.com> https://claude.ai/code/session_01QysuApEBfeCULNi5aRTzDw

- Restructure isRowFullyEmpty() to separate non-answered vs answered-but-empty checks for clarity - Consolidate redundant isRowFullyEmpty unit tests into table-driven format (7 → 6 tests, same coverage) - Remove redundant 'does not warn when 3 of 4 cells filled' test (boundary already tested at 2 of 4) - Remove comments that restate obvious test behavior Co-Authored-By: Claude <noreply@anthropic.com> https://claude.ai/code/session_01QysuApEBfeCULNi5aRTzDw

Co-Authored-By: Claude <noreply@anthropic.com> https://claude.ai/code/session_01QysuApEBfeCULNi5aRTzDw

Co-Authored-By: Cursor <noreply@cursor.com> Co-authored-by: Cursor <cursoragent@cursor.com>

Review fixes for PR #161: - isRowFullyEmpty no longer treats aborted cells as empty (aborted carries intentional signal from the agent) - Add round-trip serialization test for empty row dropping - Add progress computation tests for tables with empty rows - Simplify mostly-empty row warning message (remove prescriptive advice) - Fix spec numbering gap (A, B, D -> A, B, C) - Update spec to reflect aborted-row fix and status Co-Authored-By: Cursor <noreply@cursor.com> Co-authored-by: Cursor <cursoragent@cursor.com>

claude added 4 commits February 20, 2026 10:02

chatgpt-codex-connector bot reviewed Feb 20, 2026

View reviewed changes

claude added 4 commits February 20, 2026 17:30

chore: save tbd outbox with closed beads

9208fe0

Co-Authored-By: Claude <noreply@anthropic.com> https://claude.ai/code/session_01QysuApEBfeCULNi5aRTzDw

chore: save tbd outbox with review fixes bead

6ace4b1

Co-Authored-By: Claude <noreply@anthropic.com> https://claude.ai/code/session_01QysuApEBfeCULNi5aRTzDw

jlevy changed the title ~~docs: add plan spec for table row validation~~ feat: drop empty table rows, warn on sparse rows, strengthen minRows/maxRows Feb 21, 2026

jlevy and others added 2 commits February 21, 2026 15:02

Merge branch 'main' into claude/validate-form-rows-3SgLa

dfc86e6

Co-Authored-By: Cursor <noreply@cursor.com> Co-authored-by: Cursor <cursoragent@cursor.com>

jlevy merged commit 1387135 into main Feb 22, 2026
1 check passed

jlevy deleted the claude/validate-form-rows-3SgLa branch February 22, 2026 00:06

jlevy changed the title ~~feat: drop empty table rows, warn on sparse rows, strengthen minRows/maxRows~~ feat: FillRecord source-of-truth enrichment with provenance and per-turn detail Feb 22, 2026

This was referenced Feb 22, 2026

isValueEmpty treats tables with template rows as non-empty, preventing LLM from filling them #160

Closed

fix: table row validation improvements and defensive coding #163

Merged

chore: release markform v0.1.28 #164

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

feat: FillRecord source-of-truth enrichment with provenance and per-turn detail#161

feat: FillRecord source-of-truth enrichment with provenance and per-turn detail#161
jlevy merged 10 commits intomainfrom
claude/validate-form-rows-3SgLa

jlevy commented Feb 20, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Feb 20, 2026

Uh oh!

github-actions bot commented Feb 20, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		!cell \|\| cell.state === 'skipped' \|\| cell.state === 'aborted'
		\|\| cell.value === undefined \|\| cell.value === null \|\| cell.value === '',

Comments

Conversation

jlevy commented Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Test Plan

Related Beads

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Coverage Report for packages/markform

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jlevy commented Feb 20, 2026 •

edited

Loading

github-actions bot commented Feb 20, 2026 •

edited

Loading