DOJ-4839: harden workbook output — knob split-bar, lede/quote/code-highlight, output validator#30
Merged
Conversation
…ghlight, validator Improvements surfaced reviewing the real intro-to-quantum workbook. - Template: knob `is-split` variant (two-color A-vs-B bar driven by the existing --knob var + a 2-item legend; standard knob unchanged); promote `wb-lede` and `wb-quote` into the kit; dependency-free code-highlight token classes (kw/str/num/fn/cmt) with fixed legible hues on the code surface. - Skill: add lede / statement / quote / split-knob to the catalog; document the highlight convention. - Command: map the new components; add Phase 6 — run the output validator. - scripts/validate_workbook.py (NEW, stdlib only): checks a generated workbook is standalone (no external script/CDN), in-memory-only (no localStorage/ postMessage/etc.), has globally-unique ids (critical post fan-out), and warns on missing a11y landmarks / system-ui fallback. Validated the real intro-to-quantum workbook: 0 errors, 0 warnings. New template components render + the split bar tracks --knob; zero console errors. CI lint green. No invariant change. Created by Claude Code on behalf of @daniel Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Contributor
Author
|
@dojo-code-reviewer review |
There was a problem hiding this comment.
✅ Approved
Approved — 0 blockers, 5 P2, 1 P3. Confidence: 2/5.
🟡 P2 — Major
assets/templates/workbook/workbook-base.html:175— 🟡 P2 (major) — The pull-quote citation text color uses var(--wb-muted), leading to an inaccessible 2.42:1 contrast ratio against the light page backgrounds. Switching to var(--wb-sub) meets the 4.5:1 WCAG AA threshold.
.wb-quote cite { display: block; margin-top: var(--wb-sp-2); font-style: normal; font-size: .9rem; color: var(--wb-sub); }
assets/templates/workbook/workbook-base.html:292— 🟡 P2 (major) — Using var(--wb-muted) (#8a8a93) on a white page background results in a low 2.42:1 contrast ratio, failing WCAG AA (minimum 4.5:1) for outcome labels. We should switch the legend text color to var(--wb-sub) (#57575f, 5.4:1 contrast ratio) to ensure high readability.
.wb-knob__legend { display: flex; gap: var(--wb-sp-4); margin-top: var(--wb-sp-2); font-size: .8rem; color: var(--wb-sub); }
scripts/validate_workbook.py:28— 🟡 P2 (major) — Scanning the entire document for CDN URLs flags safe outbound hyperlinks (e.g., anchors or references in text) rather than actual runtime dependencies. We should target only links and resource imports from tags like ,, <iframe>, or <script> to prevent false validation errors on educational resources.
CDN_RE = re.compile(r"<(?:link|script|img|iframe)\b[^>]*(?:href|src)\s*=\s*[\"'](https?://[^\1'\s]*(?:cdn\.|unpkg\.com|jsdelivr\.net|cdnjs\.))", re.I)
scripts/validate_workbook.py:54— 🟡 P2 (major) — Checking state-persistence APIs on the raw HTML content will raise errors and fail the build if terms like 'localStorage' are mentioned in instructional paragraphs, code syntax blocks, or annotations. We should parse out and validate only the JavaScript contents within <script> tags to allow safe conceptual discussions.
scripts_content = "\n".join(m.group(1) for m in re.finditer(r"<script\b[^>]*>(.*?)</script>", html, re.DOTALL | re.IGNORECASE))
state_hits = sorted({m.group(1) for m in STATE_RE.finditer(scripts_content)})
scripts/validate_workbook.py:63— 🟡 P2 (major) — Detecting duplicate IDs globally across the raw HTML will trigger false errors if instructional HTML code blocks contain sample code with placeholder IDs. We should stripand
elements prior to identifying document-level IDs.
html_no_code = re.sub(r"<(pre|code)\b[^>]*>.*?</\1>", "", html, flags=re.DOTALL | re.IGNORECASE)
for m in ID_RE.finditer(html_no_code):
🔵 P3 — Minor
scripts/validate_workbook.py:32 — 🔵 P3 (minor) — Strict double-quote matches in accessibility landmark regexes will cause false warnings on perfectly valid HTML using single-quoted or unquoted role attributes. We should allow alternative quotes in role matches.
ROLE_REGION_RE = re.compile(r'role\s*=\s*["\']?region["\']?', re.I)
ROLE_PROGRESS_RE = re.compile(r'role\s*=\s*["\']?progressbar["\']?', re.I)
Total findings: 5 compliance, 1 business context (6 total)
ℹ️ Inline anchoring failed for the comments below; they are listed here instead.
assets/templates/workbook/workbook-base.html:175 — 🟡 P2 (major) — The pull-quote citation text color uses var(--wb-muted), leading to an inaccessible 2.42:1 contrast ratio against the light page backgrounds. Switching to var(--wb-sub) meets the 4.5:1 WCAG AA threshold.
.wb-quote cite { display: block; margin-top: var(--wb-sp-2); font-style: normal; font-size: .9rem; color: var(--wb-sub); }
assets/templates/workbook/workbook-base.html:292 — 🟡 P2 (major) — Using var(--wb-muted) (#8a8a93) on a white page background results in a low 2.42:1 contrast ratio, failing WCAG AA (minimum 4.5:1) for outcome labels. We should switch the legend text color to var(--wb-sub) (#57575f, 5.4:1 contrast ratio) to ensure high readability.
.wb-knob__legend { display: flex; gap: var(--wb-sp-4); margin-top: var(--wb-sp-2); font-size: .8rem; color: var(--wb-sub); }
scripts/validate_workbook.py:28 — 🟡 P2 (major) — Scanning the entire document for CDN URLs flags safe outbound hyperlinks (e.g., anchors or references in text) rather than actual runtime dependencies. We should target only links and resource imports from tags like ,
, <iframe>, or <script> to prevent false validation errors on educational resources.
CDN_RE = re.compile(r"<(?:link|script|img|iframe)\b[^>]*(?:href|src)\s*=\s*[\"'](https?://[^\1'\s]*(?:cdn\.|unpkg\.com|jsdelivr\.net|cdnjs\.))", re.I)
scripts/validate_workbook.py:32 — 🔵 P3 (minor) — Strict double-quote matches in accessibility landmark regexes will cause false warnings on perfectly valid HTML using single-quoted or unquoted role attributes. We should allow alternative quotes in role matches.
ROLE_REGION_RE = re.compile(r'role\s*=\s*["\']?region["\']?', re.I)
ROLE_PROGRESS_RE = re.compile(r'role\s*=\s*["\']?progressbar["\']?', re.I)
scripts/validate_workbook.py:54 — 🟡 P2 (major) — Checking state-persistence APIs on the raw HTML content will raise errors and fail the build if terms like 'localStorage' are mentioned in instructional paragraphs, code syntax blocks, or annotations. We should parse out and validate only the JavaScript contents within <script> tags to allow safe conceptual discussions.
scripts_content = "\n".join(m.group(1) for m in re.finditer(r"<script\b[^>]*>(.*?)</script>", html, re.DOTALL | re.IGNORECASE))
state_hits = sorted({m.group(1) for m in STATE_RE.finditer(scripts_content)})
scripts/validate_workbook.py:63 — 🟡 P2 (major) — Detecting duplicate IDs globally across the raw HTML will trigger false errors if instructional HTML code blocks contain sample code with placeholder IDs. We should strip and elements prior to identifying document-level IDs.
html_no_code = re.sub(r"<(pre|code)\b[^>]*>.*?</\1>", "", html, flags=re.DOTALL | re.IGNORECASE)
for m in ID_RE.finditer(html_no_code):
…ardening - quote `cite` + knob legend: var(--wb-muted) -> var(--wb-sub) (WCAG AA, P2 x2). - validator (P2 x3 + P3): CDN check scoped to resource-tag href/src (not <a> links); state-API check scans <script> contents only (not prose/code that merely mentions localStorage); id-uniqueness excludes <pre>/<code> sample code; role regexes accept single/unquoted attributes. Verified: quantum workbook still 0/0; synthetic false-positive file (localStorage in prose, <a> cdn link, sample-code id) -> 0 errors; true-positive file (cdn <script>, localStorage in script, dup id) -> 4 errors. CI lint green. Created by Claude Code on behalf of @daniel Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
✅ Approved
Approved — 0 blockers, 3 P3. Confidence: 4/5.
🔵 P3 — Minor
scripts/validate_workbook.py:29— 🔵 P3 (minor) — Protocol-relative URLs (e.g., '//unpkg.com/...') are not captured by the current regex because it expects 'https?://'. This can cause some external CDN references to go undetected. Updating the pattern to support optional protocols makes the validation more robust.
r"""<(?:link|script|img|iframe)\b[^>]*?\b(?:href|src)\s*=\s*["']((?:https?:)?//[^"']*(?:cdn\.|unpkg\.com|jsdelivr\.net|cdnjs\.)[^"']*)""",
scripts/validate_workbook.py:37— 🔵 P3 (minor) — Single-quoted element IDs (e.g., id='my-id') will bypass this duplicate-check regex since it is strictly anchored to double-quotes. Supporting both single and double quotes ensures any valid HTML markup pattern is accurately checked.
ID_RE = re.compile(r'\sid\s*=\s*["\']([^"\']+)["\']', re.I)
scripts/validate_workbook.py:72— 🔵 P3 (minor) — Script blocks containing variable assignments, string constants, or template literals with the string 'id="..."' can trigger false positives during duplicate-ID scanning because script content is not stripped. Removing script blocks before scanning prevents JS logic from being mistakenly classified as HTML element IDs.
html_no_code = SCRIPT_BLOCK_RE.sub("", CODE_BLOCK_RE.sub("", html))
Total findings: 3 business context (3 total)
ℹ️ Inline anchoring failed for the comments below; they are listed here instead.
scripts/validate_workbook.py:29— 🔵 P3 (minor) — Protocol-relative URLs (e.g., '//unpkg.com/...') are not captured by the current regex because it expects 'https?://'. This can cause some external CDN references to go undetected. Updating the pattern to support optional protocols makes the validation more robust.
r"""<(?:link|script|img|iframe)\b[^>]*?\b(?:href|src)\s*=\s*["']((?:https?:)?//[^"']*(?:cdn\.|unpkg\.com|jsdelivr\.net|cdnjs\.)[^"']*)""",
scripts/validate_workbook.py:37— 🔵 P3 (minor) — Single-quoted element IDs (e.g., id='my-id') will bypass this duplicate-check regex since it is strictly anchored to double-quotes. Supporting both single and double quotes ensures any valid HTML markup pattern is accurately checked.
ID_RE = re.compile(r'\sid\s*=\s*["\']([^"\']+)["\']', re.I)
scripts/validate_workbook.py:72— 🔵 P3 (minor) — Script blocks containing variable assignments, string constants, or template literals with the string 'id="..."' can trigger false positives during duplicate-ID scanning because script content is not stripped. Removing script blocks before scanning prevents JS logic from being mistakenly classified as HTML element IDs.
html_no_code = SCRIPT_BLOCK_RE.sub("", CODE_BLOCK_RE.sub("", html))
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Output-quality hardening surfaced by reviewing a real generated workbook (
intro-to-quantum). Three cohesive items; no invariant change (the validator enforces the existing rules, it doesn't alter them), no architecture change.Changes
is-splitvariant (workbook-base.html) — a generated knob captioned "probability of measuring 1 (coral) vs 0 (lilac)" out-promised the single-color--knobpreview. Added a full-width two-color A-vs-B bar (value portion =--wb-accent, remainder =--wb-accent-2) driven by the same--knobvar (no JS change) + a 2-item legend. Standard knob unchanged.wb-lede(lead paragraph) andwb-quote(pull-quote +cite), plus a dependency-free code-highlight convention (kw/str/num/fn/cmttoken spans, fixed legible hues on the code surface). These were being re-invented per generation; now tested + themed in the base.scripts/validate_workbook.py(NEW, stdlib only) — an author tool that validates a generated workbook against its invariants: standalone (no external<script src>/CDN), in-memory-only (nolocalStorage/sessionStorage/indexedDB/postMessage), globally-unique element ids (errors), + a11y-landmark /system-ui-fallback (warnings). Wired into the command as Phase 6.Verification
intro-to-quantumworkbook → 0 errors, 0 warnings.--knob(verified in preview); standard knob stays single-color; zero console errors.:root/invariant change; brand stays token-driven.Out of scope
Density / companion-vs-duplicate guidance and the empirical variety test (separate follow-ups).
Closes DOJ-4839
Created by Claude Code on behalf of @dbejarano820
🤖 Generated with Claude Code