Skip to content

DOJ-4839: harden workbook output — knob split-bar, lede/quote/code-highlight, output validator#30

Merged
dbejarano820 merged 2 commits into
mainfrom
daniel/doj-workbook-output-hardening
Jun 2, 2026
Merged

DOJ-4839: harden workbook output — knob split-bar, lede/quote/code-highlight, output validator#30
dbejarano820 merged 2 commits into
mainfrom
daniel/doj-workbook-output-hardening

Conversation

@dbejarano820

Copy link
Copy Markdown
Contributor

Summary

Output-quality hardening surfaced by reviewing a real generated workbook (intro-to-quantum). Three cohesive items; no invariant change (the validator enforces the existing rules, it doesn't alter them), no architecture change.

Changes

  • Knob is-split variant (workbook-base.html) — a generated knob captioned "probability of measuring 1 (coral) vs 0 (lilac)" out-promised the single-color --knob preview. Added a full-width two-color A-vs-B bar (value portion = --wb-accent, remainder = --wb-accent-2) driven by the same --knob var (no JS change) + a 2-item legend. Standard knob unchanged.
  • Promote bespoke patterns into the kitwb-lede (lead paragraph) and wb-quote (pull-quote + cite), plus a dependency-free code-highlight convention (kw/str/num/fn/cmt token spans, fixed legible hues on the code surface). These were being re-invented per generation; now tested + themed in the base.
  • scripts/validate_workbook.py (NEW, stdlib only) — an author tool that validates a generated workbook against its invariants: standalone (no external <script src>/CDN), in-memory-only (no localStorage/sessionStorage/indexedDB/postMessage), globally-unique element ids (errors), + a11y-landmark / system-ui-fallback (warnings). Wired into the command as Phase 6.
  • Skill + command updated: new components in the catalog + mapping; the validate step documented.

Verification

  • Ran the validator on the real intro-to-quantum workbook → 0 errors, 0 warnings.
  • New components render; the split bar's gradient stop tracks --knob (verified in preview); standard knob stays single-color; zero console errors.
  • CI lint green (frontmatter / agent-references / json-schemas).
  • No :root/invariant change; brand stays token-driven.

Out of scope

Density / companion-vs-duplicate guidance and the empirical variety test (separate follow-ups).

Closes DOJ-4839

Created by Claude Code on behalf of @dbejarano820

🤖 Generated with Claude Code

…ghlight, validator

Improvements surfaced reviewing the real intro-to-quantum workbook.

- Template: knob `is-split` variant (two-color A-vs-B bar driven by the existing
  --knob var + a 2-item legend; standard knob unchanged); promote `wb-lede` and
  `wb-quote` into the kit; dependency-free code-highlight token classes
  (kw/str/num/fn/cmt) with fixed legible hues on the code surface.
- Skill: add lede / statement / quote / split-knob to the catalog; document the
  highlight convention.
- Command: map the new components; add Phase 6 — run the output validator.
- scripts/validate_workbook.py (NEW, stdlib only): checks a generated workbook
  is standalone (no external script/CDN), in-memory-only (no localStorage/
  postMessage/etc.), has globally-unique ids (critical post fan-out), and warns
  on missing a11y landmarks / system-ui fallback.

Validated the real intro-to-quantum workbook: 0 errors, 0 warnings. New template
components render + the split bar tracks --knob; zero console errors. CI lint
green. No invariant change.

Created by Claude Code on behalf of @daniel

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@dbejarano820

Copy link
Copy Markdown
Contributor Author

@dojo-code-reviewer review

@dojo-code-reviewer dojo-code-reviewer Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Approved

Approved — 0 blockers, 5 P2, 1 P3. Confidence: 2/5.

🟡 P2 — Major

  • assets/templates/workbook/workbook-base.html:175 — 🟡 P2 (major) — The pull-quote citation text color uses var(--wb-muted), leading to an inaccessible 2.42:1 contrast ratio against the light page backgrounds. Switching to var(--wb-sub) meets the 4.5:1 WCAG AA threshold.
.wb-quote cite { display: block; margin-top: var(--wb-sp-2); font-style: normal; font-size: .9rem; color: var(--wb-sub); }
  • assets/templates/workbook/workbook-base.html:292 — 🟡 P2 (major) — Using var(--wb-muted) (#8a8a93) on a white page background results in a low 2.42:1 contrast ratio, failing WCAG AA (minimum 4.5:1) for outcome labels. We should switch the legend text color to var(--wb-sub) (#57575f, 5.4:1 contrast ratio) to ensure high readability.
.wb-knob__legend { display: flex; gap: var(--wb-sp-4); margin-top: var(--wb-sp-2); font-size: .8rem; color: var(--wb-sub); }
  • scripts/validate_workbook.py:28 — 🟡 P2 (major) — Scanning the entire document for CDN URLs flags safe outbound hyperlinks (e.g., anchors or references in text) rather than actual runtime dependencies. We should target only links and resource imports from tags like , , <iframe>, or <script> to prevent false validation errors on educational resources.
CDN_RE = re.compile(r"<(?:link|script|img|iframe)\b[^>]*(?:href|src)\s*=\s*[\"'](https?://[^\1'\s]*(?:cdn\.|unpkg\.com|jsdelivr\.net|cdnjs\.))", re.I)
  • scripts/validate_workbook.py:54 — 🟡 P2 (major) — Checking state-persistence APIs on the raw HTML content will raise errors and fail the build if terms like 'localStorage' are mentioned in instructional paragraphs, code syntax blocks, or annotations. We should parse out and validate only the JavaScript contents within <script> tags to allow safe conceptual discussions.
    scripts_content = "\n".join(m.group(1) for m in re.finditer(r"<script\b[^>]*>(.*?)</script>", html, re.DOTALL | re.IGNORECASE))
    state_hits = sorted({m.group(1) for m in STATE_RE.finditer(scripts_content)})
  • scripts/validate_workbook.py:63 — 🟡 P2 (major) — Detecting duplicate IDs globally across the raw HTML will trigger false errors if instructional HTML code blocks contain sample code with placeholder IDs. We should strip
     and  elements prior to identifying document-level IDs.
    html_no_code = re.sub(r"<(pre|code)\b[^>]*>.*?</\1>", "", html, flags=re.DOTALL | re.IGNORECASE)
    for m in ID_RE.finditer(html_no_code):

🔵 P3 — Minor

  • scripts/validate_workbook.py:32 — 🔵 P3 (minor) — Strict double-quote matches in accessibility landmark regexes will cause false warnings on perfectly valid HTML using single-quoted or unquoted role attributes. We should allow alternative quotes in role matches.
ROLE_REGION_RE = re.compile(r'role\s*=\s*["\']?region["\']?', re.I)
ROLE_PROGRESS_RE = re.compile(r'role\s*=\s*["\']?progressbar["\']?', re.I)

Total findings: 5 compliance, 1 business context (6 total)


ℹ️ Inline anchoring failed for the comments below; they are listed here instead.

  • assets/templates/workbook/workbook-base.html:175 — 🟡 P2 (major) — The pull-quote citation text color uses var(--wb-muted), leading to an inaccessible 2.42:1 contrast ratio against the light page backgrounds. Switching to var(--wb-sub) meets the 4.5:1 WCAG AA threshold.
.wb-quote cite { display: block; margin-top: var(--wb-sp-2); font-style: normal; font-size: .9rem; color: var(--wb-sub); }
  • assets/templates/workbook/workbook-base.html:292 — 🟡 P2 (major) — Using var(--wb-muted) (#8a8a93) on a white page background results in a low 2.42:1 contrast ratio, failing WCAG AA (minimum 4.5:1) for outcome labels. We should switch the legend text color to var(--wb-sub) (#57575f, 5.4:1 contrast ratio) to ensure high readability.
.wb-knob__legend { display: flex; gap: var(--wb-sp-4); margin-top: var(--wb-sp-2); font-size: .8rem; color: var(--wb-sub); }
  • scripts/validate_workbook.py:28 — 🟡 P2 (major) — Scanning the entire document for CDN URLs flags safe outbound hyperlinks (e.g., anchors or references in text) rather than actual runtime dependencies. We should target only links and resource imports from tags like , , <iframe>, or <script> to prevent false validation errors on educational resources.
CDN_RE = re.compile(r"<(?:link|script|img|iframe)\b[^>]*(?:href|src)\s*=\s*[\"'](https?://[^\1'\s]*(?:cdn\.|unpkg\.com|jsdelivr\.net|cdnjs\.))", re.I)
  • scripts/validate_workbook.py:32 — 🔵 P3 (minor) — Strict double-quote matches in accessibility landmark regexes will cause false warnings on perfectly valid HTML using single-quoted or unquoted role attributes. We should allow alternative quotes in role matches.
ROLE_REGION_RE = re.compile(r'role\s*=\s*["\']?region["\']?', re.I)
ROLE_PROGRESS_RE = re.compile(r'role\s*=\s*["\']?progressbar["\']?', re.I)
  • scripts/validate_workbook.py:54 — 🟡 P2 (major) — Checking state-persistence APIs on the raw HTML content will raise errors and fail the build if terms like 'localStorage' are mentioned in instructional paragraphs, code syntax blocks, or annotations. We should parse out and validate only the JavaScript contents within <script> tags to allow safe conceptual discussions.
    scripts_content = "\n".join(m.group(1) for m in re.finditer(r"<script\b[^>]*>(.*?)</script>", html, re.DOTALL | re.IGNORECASE))
    state_hits = sorted({m.group(1) for m in STATE_RE.finditer(scripts_content)})
  • scripts/validate_workbook.py:63 — 🟡 P2 (major) — Detecting duplicate IDs globally across the raw HTML will trigger false errors if instructional HTML code blocks contain sample code with placeholder IDs. We should strip
     and  elements prior to identifying document-level IDs.
    html_no_code = re.sub(r"<(pre|code)\b[^>]*>.*?</\1>", "", html, flags=re.DOTALL | re.IGNORECASE)
    for m in ID_RE.finditer(html_no_code):

…ardening

- quote `cite` + knob legend: var(--wb-muted) -> var(--wb-sub) (WCAG AA, P2 x2).
- validator (P2 x3 + P3): CDN check scoped to resource-tag href/src (not <a>
  links); state-API check scans <script> contents only (not prose/code that
  merely mentions localStorage); id-uniqueness excludes <pre>/<code> sample
  code; role regexes accept single/unquoted attributes.

Verified: quantum workbook still 0/0; synthetic false-positive file (localStorage
in prose, <a> cdn link, sample-code id) -> 0 errors; true-positive file (cdn
<script>, localStorage in script, dup id) -> 4 errors. CI lint green.

Created by Claude Code on behalf of @daniel

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@dbejarano820 dbejarano820 merged commit 1d51b8b into main Jun 2, 2026
1 check passed
@dbejarano820 dbejarano820 deleted the daniel/doj-workbook-output-hardening branch June 2, 2026 22:03

@dojo-code-reviewer dojo-code-reviewer Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Approved

Approved — 0 blockers, 3 P3. Confidence: 4/5.

🔵 P3 — Minor

  • scripts/validate_workbook.py:29 — 🔵 P3 (minor) — Protocol-relative URLs (e.g., '//unpkg.com/...') are not captured by the current regex because it expects 'https?://'. This can cause some external CDN references to go undetected. Updating the pattern to support optional protocols makes the validation more robust.
    r"""<(?:link|script|img|iframe)\b[^>]*?\b(?:href|src)\s*=\s*["']((?:https?:)?//[^"']*(?:cdn\.|unpkg\.com|jsdelivr\.net|cdnjs\.)[^"']*)""",
  • scripts/validate_workbook.py:37 — 🔵 P3 (minor) — Single-quoted element IDs (e.g., id='my-id') will bypass this duplicate-check regex since it is strictly anchored to double-quotes. Supporting both single and double quotes ensures any valid HTML markup pattern is accurately checked.
ID_RE = re.compile(r'\sid\s*=\s*["\']([^"\']+)["\']', re.I)
  • scripts/validate_workbook.py:72 — 🔵 P3 (minor) — Script blocks containing variable assignments, string constants, or template literals with the string 'id="..."' can trigger false positives during duplicate-ID scanning because script content is not stripped. Removing script blocks before scanning prevents JS logic from being mistakenly classified as HTML element IDs.
    html_no_code = SCRIPT_BLOCK_RE.sub("", CODE_BLOCK_RE.sub("", html))

Total findings: 3 business context (3 total)


ℹ️ Inline anchoring failed for the comments below; they are listed here instead.

  • scripts/validate_workbook.py:29 — 🔵 P3 (minor) — Protocol-relative URLs (e.g., '//unpkg.com/...') are not captured by the current regex because it expects 'https?://'. This can cause some external CDN references to go undetected. Updating the pattern to support optional protocols makes the validation more robust.
    r"""<(?:link|script|img|iframe)\b[^>]*?\b(?:href|src)\s*=\s*["']((?:https?:)?//[^"']*(?:cdn\.|unpkg\.com|jsdelivr\.net|cdnjs\.)[^"']*)""",
  • scripts/validate_workbook.py:37 — 🔵 P3 (minor) — Single-quoted element IDs (e.g., id='my-id') will bypass this duplicate-check regex since it is strictly anchored to double-quotes. Supporting both single and double quotes ensures any valid HTML markup pattern is accurately checked.
ID_RE = re.compile(r'\sid\s*=\s*["\']([^"\']+)["\']', re.I)
  • scripts/validate_workbook.py:72 — 🔵 P3 (minor) — Script blocks containing variable assignments, string constants, or template literals with the string 'id="..."' can trigger false positives during duplicate-ID scanning because script content is not stripped. Removing script blocks before scanning prevents JS logic from being mistakenly classified as HTML element IDs.
    html_no_code = SCRIPT_BLOCK_RE.sub("", CODE_BLOCK_RE.sub("", html))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant