Skip to content

feat(web-ui): compress and summarize verbose execution event stream (#475)#497

Merged
frankbria merged 2 commits intomainfrom
feat/issue-475-compress-event-stream
Mar 24, 2026
Merged

feat(web-ui): compress and summarize verbose execution event stream (#475)#497
frankbria merged 2 commits intomainfrom
feat/issue-475-compress-event-stream

Conversation

@frankbria
Copy link
Copy Markdown
Owner

@frankbria frankbria commented Mar 24, 2026

Summary

  • groupEvents() transforms raw event arrays into grouped EventGroup[] for the smart view
  • Consecutive file-read events → collapsible ReadGroupRow ("Read N files: x, y, z") with expand affordance
  • Consecutive file edit/create/delete eventsEditGroupRow single summary line ("Modified N files: …")
  • Smart view (default) renders grouped events; "Show all events" toggle reveals every raw event unmodified
  • Stream header added with view toggle button
  • All other event types (planning, verification, blockers, completion) pass through unchanged

Closes #475

Test plan

  • Smart view shows "Read N files" collapsed rows for consecutive reads
  • Clicking expand row reveals individual read events
  • Consecutive file edits show single "Modified N files" row
  • "Show all events" toggle shows every event individually
  • "Smart view" toggle returns to grouped view
  • Non-read/non-edit events render unchanged
  • Build passes clean

Summary by CodeRabbit

  • New Features
    • Implemented intelligent grouping of related read and edit operations in the event stream for improved clarity.
    • Added a header with a view toggle to switch between grouped summary view (default) and a detailed view of all individual events.
    • Collapsible read operation summaries provide quick overviews while maintaining access to full details.
    • Added a compact badge-style summary for edit groups to improve scannability.

…475)

- groupEvents() function transforms raw events into EventGroup[] for smart view
- Consecutive file-read events → collapsible ReadGroupRow ("Read N files")
- Consecutive file edit/create/delete events → EditGroupRow summary line
- Smart view is default; "Show all events" toggle switches to raw log
- ReadGroupRow expands to show individual events on click
- Added stream header with event count context and view toggle
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 24, 2026

Walkthrough

Adds event grouping and UI controls to the execution event stream: filters heartbeat events, groups consecutive progress events into read and edit summaries, introduces collapsible read groups and summarized edit rows, a smart/raw view toggle, an icon import, and a small scroll-height adjustment.

Changes

Cohort / File(s) Summary
Event Stream Smart Grouping
web-ui/src/components/execution/EventStream.tsx
Added useMemo-based filtering and groupEvents pipeline, introduced EventGroup types and helpers (extractFilename, isReadEvent, isEditEvent), added ReadGroupRow (collapsible) and EditGroupRow (summary) subcomponents, integrated ArrowRight01Icon, added showAll header toggle to switch smart vs raw views, and adjusted scroll container height.
Event Styles
web-ui/src/lib/eventStyles.ts
Exported new editGroupBadgeStyles Tailwind class string for styling the edit-group summary row; other style exports unchanged.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 I nibble logs and weave them tight,
Reads tucked in folds, edits shining bright,
A little arrow nudges doors ajar,
Smart and raw — both near, both far,
Hooray for streams that hop and light the night!

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Linked Issues check ⚠️ Warning The PR implements 3 of 5 core acceptance criteria from #475: collapsed expandable read events, grouped edit events, and 'Show all events' toggle. Phase checkpoints and progress indicator are not implemented. Implement the remaining acceptance criteria: phase progress checkpoints at natural milestones and overall progress indicators to fully address issue #475.
Docstring Coverage ⚠️ Warning Docstring coverage is 22.22% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately reflects the main change: compressing and summarizing the verbose event stream with grouped events and a smart view toggle.
Out of Scope Changes check ✅ Passed All changes are directly related to compressing and grouping the event stream. The new grouping utilities, UI components, and styling additions align with PR objectives and requirements from issue #475.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/issue-475-compress-event-stream

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (4)
web-ui/src/components/execution/EventStream.tsx (4)

198-203: Use Shadcn/UI Button component for consistency.

The toggle button uses a plain <button> element while the "New events" button (line 258) uses the Shadcn/UI Button component. For consistency and to leverage Shadcn/UI's built-in accessibility and styling, consider using Button with appropriate variant.

♻️ Suggested change
-        <button
-          className="text-xs text-muted-foreground hover:text-foreground"
+        <Button
+          variant="ghost"
+          size="sm"
+          className="h-auto px-2 py-1 text-xs text-muted-foreground hover:text-foreground"
           onClick={() => setShowAll((v) => !v)}
         >
           {showAll ? 'Smart view' : 'Show all events'}
-        </button>
+        </Button>
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web-ui/src/components/execution/EventStream.tsx` around lines 198 - 203, The
toggle uses a raw <button> instead of the Shadcn/UI Button component—replace the
plain element in EventStream.tsx with the Button component (imported from your
shadcn/ui library) and pass the same onClick handler (setShowAll) and display
text (showAll ? 'Smart view' : 'Show all events'), applying an appropriate
variant/size (e.g., variant="ghost" or "link"/size="sm") and the equivalent
className props to preserve styling and accessibility; ensure the import for
Button is added/updated and that the component still toggles setShowAll((v) =>
!v) and uses the existing showAll symbol for the label.

27-35: Type assertions bypass type safety.

The repeated (e as { message?: string }).message pattern sidesteps TypeScript's type checking. Since ExecutionEvent is a union type and you've already narrowed to event_type === 'progress', consider defining a type guard or using proper type narrowing to access message safely.

♻️ Suggested approach using type guard
+function isProgressEventWithMessage(e: ExecutionEvent): e is ExecutionEvent & { message: string } {
+  return e.event_type === 'progress' && typeof (e as { message?: string }).message === 'string';
+}
+
 function isReadEvent(e: ExecutionEvent): boolean {
-  if (e.event_type !== 'progress') return false;
-  return /^reading file:/i.test((e as { message?: string }).message ?? '');
+  return isProgressEventWithMessage(e) && /^reading file:/i.test(e.message);
 }
 
 function isEditEvent(e: ExecutionEvent): boolean {
-  if (e.event_type !== 'progress') return false;
-  return /^(creating|editing|deleting) file:/i.test((e as { message?: string }).message ?? '');
+  return isProgressEventWithMessage(e) && /^(creating|editing|deleting) file:/i.test(e.message);
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web-ui/src/components/execution/EventStream.tsx` around lines 27 - 35, The
two helpers isReadEvent and isEditEvent use unsafe type assertions for message;
define a proper type guard (e.g., isProgressEvent(e): e is ProgressEvent or a
narrowed type that includes message?: string) that checks e.event_type ===
'progress' and that typeof e.message === 'string' (or message existence), then
replace the inline assertions in isReadEvent and isEditEvent to call that type
guard so you can safely access e.message without bypassing TypeScript's checks.

206-213: Consider flex layout to avoid hardcoded height.

The hardcoded 37px in h-[calc(100%-37px)] is fragile and will break if the header's padding or font size changes. A flex-based approach would be more resilient.

♻️ Alternative using flex layout
-    <div className="relative flex-1 overflow-hidden rounded-lg border">
+    <div className="relative flex flex-1 flex-col overflow-hidden rounded-lg border">
       {/* Header: stream label + view toggle */}
-      <div className="flex items-center justify-between border-b px-4 py-2">
+      <div className="flex shrink-0 items-center justify-between border-b px-4 py-2">
         ...
       </div>

       <div
         ref={containerRef}
-        className="h-[calc(100%-37px)] overflow-y-auto p-4"
+        className="min-h-0 flex-1 overflow-y-auto p-4"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web-ui/src/components/execution/EventStream.tsx` around lines 206 - 213, The
container currently uses a hardcoded height via className "h-[calc(100%-37px)]"
which is brittle; change the layout to flex by making the parent wrapper a flex
column and replace the container's fixed height with a flex-grow class (e.g.,
flex-1 or flex-auto) so the scrollable area grows to fill remaining space.
Locate the EventStream component and update the element that uses ref
containerRef and onScroll={handleScroll} to remove the calc height and use the
flex-based overflow class (overflow-auto/overflow-y-auto) so the header and
content sizing is resilient to padding/font changes. Ensure the sibling header
element remains above the scroll container in the same flex column so the stack
order and accessibility attributes (role="log", aria-live, aria-label) are
preserved.

17-20: Consider moving EventGroup type to centralized types file.

Per coding guidelines, TypeScript types should be defined in web-ui/src/types/index.ts. While this type is currently internal to this component, centralizing it would align with project conventions and facilitate reuse if other components need to work with grouped events.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web-ui/src/components/execution/EventStream.tsx` around lines 17 - 20, Move
the EventGroup type definition out of EventStream.tsx into the project's central
types index and export it so other modules can reuse it; update EventStream.tsx
to import the exported EventGroup (and ensure ExecutionEvent is imported or
re-exported if needed), keeping the same union shape ({ type: 'event' |
'read_group' | 'edit_group' } etc.) and run a quick type-check to fix any
import/exports that need adjusting.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@web-ui/src/components/execution/EventStream.tsx`:
- Line 154: The call to groupEvents(displayEvents) in EventStream.tsx is
recomputing on every render; wrap it with React's useMemo and add useMemo to the
imports so the grouping only recalculates when displayEvents changes. Locate the
constant groups = groupEvents(displayEvents) and replace it with a memoized
value using useMemo(() => groupEvents(displayEvents), [displayEvents]) and
ensure useMemo is imported at the top alongside other React hooks.
- Around line 119-133: The edit badge in EditGroupRow uses dark:bg-blue-900/40
which is inconsistent with the other badge styles (agentStateBadgeStyles uses
dark:bg-blue-900/30); extract the badge class string into a shared constant
(e.g., export const editGroupBadgeStyles) in the eventStyles module and replace
the inline class on the "edit" span with that constant
(className={editGroupBadgeStyles}), making sure the constant uses
dark:bg-blue-900/30 to match agentStateBadgeStyles for consistency and
maintainability.

---

Nitpick comments:
In `@web-ui/src/components/execution/EventStream.tsx`:
- Around line 198-203: The toggle uses a raw <button> instead of the Shadcn/UI
Button component—replace the plain element in EventStream.tsx with the Button
component (imported from your shadcn/ui library) and pass the same onClick
handler (setShowAll) and display text (showAll ? 'Smart view' : 'Show all
events'), applying an appropriate variant/size (e.g., variant="ghost" or
"link"/size="sm") and the equivalent className props to preserve styling and
accessibility; ensure the import for Button is added/updated and that the
component still toggles setShowAll((v) => !v) and uses the existing showAll
symbol for the label.
- Around line 27-35: The two helpers isReadEvent and isEditEvent use unsafe type
assertions for message; define a proper type guard (e.g., isProgressEvent(e): e
is ProgressEvent or a narrowed type that includes message?: string) that checks
e.event_type === 'progress' and that typeof e.message === 'string' (or message
existence), then replace the inline assertions in isReadEvent and isEditEvent to
call that type guard so you can safely access e.message without bypassing
TypeScript's checks.
- Around line 206-213: The container currently uses a hardcoded height via
className "h-[calc(100%-37px)]" which is brittle; change the layout to flex by
making the parent wrapper a flex column and replace the container's fixed height
with a flex-grow class (e.g., flex-1 or flex-auto) so the scrollable area grows
to fill remaining space. Locate the EventStream component and update the element
that uses ref containerRef and onScroll={handleScroll} to remove the calc height
and use the flex-based overflow class (overflow-auto/overflow-y-auto) so the
header and content sizing is resilient to padding/font changes. Ensure the
sibling header element remains above the scroll container in the same flex
column so the stack order and accessibility attributes (role="log", aria-live,
aria-label) are preserved.
- Around line 17-20: Move the EventGroup type definition out of EventStream.tsx
into the project's central types index and export it so other modules can reuse
it; update EventStream.tsx to import the exported EventGroup (and ensure
ExecutionEvent is imported or re-exported if needed), keeping the same union
shape ({ type: 'event' | 'read_group' | 'edit_group' } etc.) and run a quick
type-check to fix any import/exports that need adjusting.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 86312a49-7763-4091-b0a7-5dad2f4422d3

📥 Commits

Reviewing files that changed from the base of the PR and between 814f550 and a81ab87.

📒 Files selected for processing (1)
  • web-ui/src/components/execution/EventStream.tsx

@claude
Copy link
Copy Markdown

claude bot commented Mar 24, 2026

test

@claude
Copy link
Copy Markdown

claude bot commented Mar 24, 2026

PLACEHOLDER

@claude
Copy link
Copy Markdown

claude bot commented Mar 24, 2026

Part 1 of code review for PR #497 (EventStream.tsx smart-view grouping): CRITICAL ISSUES — (1) isReadEvent/isEditEvent cast through structural types with optional message instead of narrowing to the existing ProgressEvent type. These should be proper type guards returning 'e is ProgressEvent', eliminating repeated casts inside groupEvents. (2) groupEvents runs on every render regardless of whether showAll is true, wasting work when groups is never consumed. Wrap in useMemo with displayEvents as the dependency. (3) h-[calc(100%-37px)] hardcodes the header height; use flex-1 instead.

@claude
Copy link
Copy Markdown

claude bot commented Mar 24, 2026

Part 2 of code review — IMPORTANT ISSUES: (4) onBlockerAnswered is not threaded into ReadGroupRow children — EventItem inside ReadGroupRow is rendered without this prop, so if a blocker event lands inside a read group the answer callback is silently dropped. (5) Index-based keys like rg-i reset expanded state during live streaming as React unmounts/remounts ReadGroupRow components when the group list shifts. Use the first event timestamp as key. (6) flushRead always emits a read_group even for a single event, unlike flushEdit which degrades to a plain EventItem. Make these consistent. (7) EditGroupRow renders the full file list without truncation; ReadGroupRow clips at 4. Apply the same limit.

@claude
Copy link
Copy Markdown

claude bot commented Mar 24, 2026

Part 3 of code review — MINOR + CONTEXT: Minor issues: extractFilename fallback lets full raw message strings appear in file lists (return trimmed placeholder instead); formatTime is duplicated from EventItem.tsx inline in the new row components; groupEvents has no automated tests despite being a pure function with non-trivial buffer logic; the smart/raw toggle is missing aria-pressed so screen readers cannot communicate its state. Context: per CLAUDE.md web-ui/ is legacy/reference-only and will be replaced in Phase 3. This PR does not violate core architectural rules. Blocking items before merge: issue 2 (memoize groupEvents), issue 4 (onBlockerAnswered in ReadGroupRow), and issue 5 (stable keys). The rest are quality improvements.

@claude
Copy link
Copy Markdown

claude bot commented Mar 24, 2026

CodeFRAME Development Guidelines

Last updated: 2026-03-09

Product Vision

CodeFrame is a project delivery system: Think → Build → Prove → Ship.

It owns the edges of the AI coding pipeline — everything BEFORE code gets written (PRD, specification, task decomposition) and everything AFTER (verification gates, quality memory, deployment). The actual code writing is delegated to frontier coding agents (Claude Code, Codex, OpenCode) that are better at it than any custom agent.

CodeFrame does not compete with coding agents. It orchestrates them.

THINK:  cf prd generate → cf prd stress-test → cf tasks generate
BUILD:  cf work start --engine claude-code  (or codex, opencode, built-in)
PROVE:  cf proof run  (9-gate evidence-based quality system)
SHIP:   cf pr create → cf pr merge
LOOP:   Glitch → cf proof capture → New REQ → Enforced forever

Status: Phase 1 ✅ | Phase 2 ✅ | Phase 2.5 ✅ — CLI workflow, server layer, and ReAct agent complete. Agent adapter architecture (#408) and PROOF9 quality system (#422) are next priorities. See docs/V2_STRATEGIC_ROADMAP.md for the full plan.

If you are an agent working in this repo: do not improvise architecture. Follow the documents listed below.


Primary Contract (MUST FOLLOW)

  1. Golden Path: docs/GOLDEN_PATH.md
    The only workflow we build until it works end-to-end.

  2. Refactor Plan: docs/REFACTOR_PLAN_FOR_AGENT.md
    Step-by-step refactor instructions.

  3. Command Tree + Module Mapping: docs/CLI_WIREFRAME.md
    The authoritative map from CLI commands → core modules/functions.

  4. Agent Implementation: docs/AGENT_IMPLEMENTATION_TASKS.md
    Tracks the agent system components (all complete).

  5. Strategic Roadmap: docs/V2_STRATEGIC_ROADMAP.md
    5-phase plan from CLI to multi-agent.

Rule 0: If a change does not directly support the Think → Build → Prove → Ship pipeline, do not implement it.

Strategic Priority (Phase 4)

The next major architectural work is the Agent Adapter Architecture (#408):


Current Reality (Phase 1, 2 & 2.5 Complete)

What's Working Now

  • Full agent execution: cf work start <task-id> --execute (uses ReAct engine by default)
  • Engine selection: --engine react (default) or --engine plan (legacy)
  • Verbose mode: cf work start <task-id> --execute --verbose shows detailed progress
  • Dry run mode: cf work start <task-id> --execute --dry-run
  • Self-correction loop: Agent automatically fixes failing verification gates (up to 5 attempts with ReAct)
  • FAILED task status: Tasks can transition to FAILED for proper error visibility
  • Tech stack configuration: cf init . --detect auto-detects tech stack from project files
  • Project preferences: Agent loads AGENTS.md or CLAUDE.md for per-project configuration
  • Stall detection: Thread-based monitor with configurable recovery (--stall-action blocker|retry|fail)
  • Blocker detection: Agent creates blockers when stuck
  • Verification gates: Ruff/pytest checks after file changes
  • State persistence: Pause/resume across sessions
  • Batch execution: cf work batch run with serial/parallel/auto strategies
  • Task dependencies: depends_on field with dependency graph analysis
  • LLM dependency inference: --strategy auto analyzes task descriptions
  • Automatic retry: --retry N for failed task recovery
  • Batch resume: Re-run failed/blocked tasks from previous batches
  • Task scheduling: cf schedule show/predict/bottlenecks with CPM-based scheduling
  • Task templates: cf templates list/show/apply with 7 builtin templates
  • Effort estimation: Tasks support estimated_hours field for scheduling
  • Environment validation: cf env check/install/doctor validates tools and dependencies
  • GitHub PR workflow: cf pr create/status/checks/merge for PR management
  • Task self-diagnosis: cf work diagnose <task-id> analyzes failed tasks
  • 70+ integration tests: Comprehensive CLI test coverage
  • REST API: Full v2 API with 16 router modules (see Phase 2 below)
  • API authentication: API key auth with scopes (read/write/admin)
  • Rate limiting: Configurable per-endpoint rate limits
  • Real-time streaming: SSE for task execution events
  • OpenAPI documentation: Full Swagger/ReDoc at /docs and /redoc

v2 Architecture (current)

  • Core-first: Domain logic lives in codeframe/core/ (headless, no FastAPI imports)
  • CLI-first: Golden Path works without any running FastAPI server
  • Adapters: LLM providers in codeframe/adapters/llm/
  • Server/UI optional: FastAPI and UI are thin adapters over core

v1 Legacy

  • FastAPI server + WebSockets + React/Next.js dashboard retained for reference
  • Do not build toward v1 patterns during Golden Path work

Repository Structure

codeframe/
├── core/                    # Headless domain + orchestration (NO FastAPI imports)
│   ├── react_agent.py      # ReAct agent (default engine) - observe-think-act loop
│   ├── tools.py            # Tool definitions for ReAct agent (7 tools)
│   ├── editor.py           # Search-replace file editor with fuzzy matching
│   ├── agent.py            # Legacy plan-based agent (--engine plan)
│   ├── planner.py          # LLM-powered implementation planning (plan engine)
│   ├── executor.py         # Code execution engine with rollback (plan engine)
│   ├── context.py          # Task context loader with relevance scoring
│   ├── tasks.py            # Task management with depends_on field
│   ├── blockers.py         # Human-in-the-loop blocker system
│   ├── runtime.py          # Run lifecycle management
│   ├── conductor.py        # Batch orchestration with worker pool
│   ├── dependency_graph.py # DAG operations and execution planning
│   ├── dependency_analyzer.py # LLM-based dependency inference
│   ├── gates.py            # Verification gates (ruff, pytest, BUILD)
│   ├── fix_tracker.py      # Fix attempt tracking for loop prevention
│   ├── quick_fixes.py      # Pattern-based fixes without LLM
│   ├── agents_config.py    # AGENTS.md/CLAUDE.md preference loading
│   ├── workspace.py        # Workspace initialization
│   ├── prd.py              # PRD management
│   ├── events.py           # Event emission
│   ├── state_machine.py    # Task status transitions
│   ├── environment.py      # Environment validation and tool detection
│   ├── installer.py        # Automatic tool installation
│   ├── diagnostics.py      # Failed task analysis
│   ├── diagnostic_agent.py # AI-powered task diagnosis
│   ├── credentials.py      # API key and credential management
│   ├── stall_detector.py   # Synchronous stall detector + StallAction enum + StallDetectedError
│   ├── stall_monitor.py    # Thread-based stall watchdog with callback
│   ├── streaming.py        # Real-time output streaming for cf work follow
│   └── ...
├── adapters/
│   └── llm/                # LLM provider adapters
│       ├── base.py         # Protocol + ModelSelector + Purpose enum
│       ├── anthropic.py    # Anthropic Claude provider
│       └── mock.py         # Mock provider for testing
├── cli/
│   └── app.py              # Typer CLI entry + subcommands
├── ui/                     # FastAPI server (Phase 2 - thin adapter over core)
│   ├── server.py           # FastAPI app with OpenAPI configuration
│   ├── models.py           # Pydantic request/response models
│   ├── dependencies.py     # Shared dependencies (workspace, auth)
│   └── routers/            # API route handlers
│       ├── blockers_v2.py  # Blocker CRUD
│       ├── tasks_v2.py     # Task management + streaming
│       ├── prd_v2.py       # PRD management + versioning
│       ├── workspace_v2.py # Workspace init and status
│       ├── batches_v2.py   # Batch execution
│       ├── streaming_v2.py # SSE event streaming
│       ├── api_key_v2.py   # API key management
│       └── ...             # 16 router modules total
├── lib/                    # Shared utilities
│   ├── rate_limiter.py     # SlowAPI rate limiting
│   └── audit_logger.py     # Request audit logging
├── auth/                   # Authentication
│   ├── api_key_service.py  # API key creation/validation
│   └── dependencies.py     # Auth dependencies
├── config/
│   └── rate_limits.py      # Rate limit configuration
└── server/                 # Legacy server code (reference only)

web-ui/                     # Frontend (legacy, reference only)
tests/
├── core/                   # Core module tests
│   ├── test_agent.py
│   ├── test_executor.py
│   ├── test_planner.py
│   ├── test_context.py
│   ├── test_conductor.py
│   ├── test_dependency_graph.py
│   ├── test_dependency_analyzer.py
│   ├── test_task_dependencies.py
│   └── ...
└── adapters/
    └── test_llm.py

Architecture Rules (non-negotiable)

1) Core must be headless

codeframe/core/** must NOT import:

  • FastAPI
  • WebSocket frameworks
  • HTTP request/response objects
  • UI modules

Core is allowed to:

  • read/write durable state (SQLite/filesystem)
  • run orchestration/worker loops
  • emit events to an append-only event log
  • call adapters via interfaces (LLM, git, fs)

2) CLI must not require a server

Golden Path commands must work from the CLI with no server running.

FastAPI is optional and must be started explicitly (e.g., codeframe serve) and must wrap core.

3) Agent state transitions flow through runtime

Critical pattern discovered during implementation:

  • Agent (agent.py) manages its own AgentState (IDLE, PLANNING, EXECUTING, BLOCKED, COMPLETED, FAILED)
  • Runtime (runtime.py) handles all TaskStatus transitions (BACKLOG, READY, IN_PROGRESS, DONE, BLOCKED)
  • Agent does NOT call tasks.update_status() - runtime does this based on agent state

This separation prevents duplicate state transitions (e.g., DONE→DONE, BLOCKED→BLOCKED errors).

4) Legacy can be read, not depended on

Legacy code is reference material.

  • Copy/simplify logic into core when useful
  • Do NOT import legacy UI/server modules into core
  • Do NOT "fix the UI" during Golden Path work

5) Keep commits runnable

At all times:

  • codeframe --help works
  • Golden Path command stubs can run
  • Avoid breaking the repo with large renames/moves

Agent System Architecture

Components

Component File Purpose
ReactAgent core/react_agent.py Default engine: observe-think-act loop with tool use
Tools core/tools.py 7 agent tools: read/edit/create file, run command/tests, search, list
Editor core/editor.py Search-replace editor with 4-level fuzzy matching
Stall Detector core/stall_detector.py Synchronous stall check + StallAction enum + StallDetectedError
Stall Monitor core/stall_monitor.py Thread-based watchdog with callback (integrated into ReactAgent)
LLM Adapter adapters/llm/base.py Protocol, ModelSelector, Purpose enum
Anthropic Provider adapters/llm/anthropic.py Claude integration with streaming
Mock Provider adapters/llm/mock.py Testing with call tracking
Context Loader core/context.py Codebase scanning, relevance scoring
Planner core/planner.py Task → ImplementationPlan via LLM (plan engine)
Executor core/executor.py File ops, shell commands, rollback (plan engine)
Agent (legacy) core/agent.py Plan-based orchestration (--engine plan)
Runtime core/runtime.py Run lifecycle, engine selection, agent invocation
Conductor core/conductor.py Batch orchestration, worker pool
Dependency Graph core/dependency_graph.py DAG operations, topological sort
Dependency Analyzer core/dependency_analyzer.py LLM-based dependency inference
Environment Validator core/environment.py Tool detection and validation
Installer core/installer.py Automatic tool installation
Diagnostics core/diagnostics.py Failed task analysis
Diagnostic Agent core/diagnostic_agent.py AI-powered task diagnosis
Credentials core/credentials.py API key and credential management
Event Publisher core/streaming.py Real-time SSE event distribution
API Key Service auth/api_key_service.py API key CRUD and validation
Rate Limiter lib/rate_limiter.py Per-endpoint rate limiting

Model Selection Strategy

Task-based heuristic via Purpose enum:

  • PLANNING → claude-sonnet-4-20250514 (complex reasoning)
  • EXECUTION → claude-sonnet-4-20250514 (balanced)
  • GENERATION → claude-haiku-4-20250514 (fast/cheap)

Future: cf tasks set provider <id> <provider> for per-task override.

Engine Selection

CodeFRAME supports two execution engines, selected via --engine:

Engine Flag Pattern Best For
ReAct (default) --engine react Observe → Think → Act loop Most tasks, adaptive execution
Plan (legacy) --engine plan Plan all steps → Execute sequentially Well-defined, predictable tasks

Execution Flow (ReAct — default)

cf work start <id> --execute [--verbose]
    │
    ├── runtime.start_task_run()      # Creates run, transitions task→IN_PROGRESS
    │
    └── runtime.execute_agent(engine="react")
            │
            └── ReactAgent.run(task_id)
                ├── Load context (PRD, codebase, blockers, AGENTS.md, tech_stack)
                ├── Build layered system prompt
                │
                └── Tool-use loop (until complete/blocked/failed):
                    ├── Check stall detector (configurable: retry/blocker/fail)
                    ├── LLM decides next action (tool call)
                    ├── Execute tool: read_file, edit_file, create_file,
                    │   run_command, run_tests, search_codebase, list_files
                    ├── Observe result → feed back to LLM
                    ├── Record activity (resets stall timer)
                    ├── Incremental verification (ruff after file changes)
                    └── Token budget management (3-tier compaction)
                │
                └── Final verification with self-correction (up to 5 retries)
                │
                └── Update run/task status based on agent result
                    ├── COMPLETED → complete_run() → task→DONE
                    ├── BLOCKED → block_run() → task→BLOCKED
                    └── FAILED → fail_run() → task→FAILED

Execution Flow (Plan — legacy, --engine plan)

cf work start <id> --execute --engine plan
    │
    ├── runtime.start_task_run()
    │
    └── runtime.execute_agent(engine="plan")
            │
            ├── agent.run(task_id)
            │   ├── Load context (PRD, codebase, blockers, AGENTS.md)
            │   ├── Create plan via LLM
            │   ├── Execute steps (file create/edit, shell commands)
            │   ├── Run incremental verification (ruff)
            │   ├── Detect blockers (consecutive failures, missing files)
            │   └── Run final verification with SELF-CORRECTION LOOP:
            │       ├── Run all gates (pytest, ruff)
            │       ├── If failed: _attempt_verification_fix()
            │       │   ├── Try ruff --fix for quick lint fixes
            │       │   ├── Use LLM to generate fix plan from errors
            │       │   └── Execute fix steps
            │       └── Retry up to max_attempts (default: 3)
            │
            └── Update run/task status based on agent result
                ├── COMPLETED → complete_run() → task→DONE
                ├── BLOCKED → block_run() → task→BLOCKED
                └── FAILED → fail_run() → task→FAILED

Commands (v2 CLI)

Python (preferred)

Use uv for Python tasks:

uv run pytest
uv run pytest tests/core/  # Core module tests only
uv run ruff check .

CLI (Golden Path)

# Workspace
cf init <repo>                                    # Initialize workspace
cf init <repo> --detect                           # Initialize + auto-detect tech stack
cf init <repo> --tech-stack "Python with uv"      # Initialize + explicit tech stack
cf init <repo> --tech-stack-interactive           # Initialize + interactive setup
cf status

# PRD
cf prd add <file.md>
cf prd show

# Tasks
cf tasks generate          # Uses LLM to generate from PRD
cf tasks list
cf tasks list --status READY
cf tasks show <id>

# Work execution (single task)
cf work start <task-id>                    # Creates run record
cf work start <task-id> --execute          # Runs AI agent (ReAct engine, default)
cf work start <task-id> --execute --engine plan  # Use legacy plan engine
cf work start <task-id> --execute --verbose  # With detailed output
cf work start <task-id> --execute --dry-run  # Preview changes
cf work start <task-id> --execute --stall-timeout 120  # Custom stall timeout (0=disabled)
cf work start <task-id> --execute --stall-action retry  # Recovery: blocker|retry|fail
cf work stop <task-id>                     # Cancel stale run
cf work resume <task-id>                   # Resume blocked work
cf work follow <task-id>                   # Stream real-time output
cf work follow <task-id> --tail 50         # Show last 50 lines then stream

# Batch execution (multiple tasks)
cf work batch run <id1> <id2> ...          # Execute multiple tasks (ReAct default)
cf work batch run --all-ready              # All READY tasks
cf work batch run --all-ready --engine plan  # Use legacy plan engine
cf work batch run --strategy serial        # Serial (default)
cf work batch run --strategy parallel      # Parallel execution
cf work batch run --strategy auto          # LLM-inferred dependencies
cf work batch run --max-parallel 4         # Concurrent limit
cf work batch run --retry 3               # Auto-retry failures
cf work batch status [batch_id]            # Show batch status
cf work batch cancel <batch_id>            # Cancel running batch
cf work batch resume <batch_id>            # Re-run failed tasks

# Blockers
cf blocker list
cf blocker show <id>
cf blocker answer <id> "answer"

# Quality
cf review
cf patch export
cf commit

# State
cf checkpoint create "name"
cf checkpoint list
cf checkpoint restore <id>
cf summary

# Environment validation
cf env check                     # Validate tools and dependencies
cf env install                   # Install missing tools
cf env doctor                    # Comprehensive environment health check

# GitHub PR workflow
cf pr create                     # Create PR from current branch
cf pr status                     # Show PR status
cf pr checks                     # Show CI check results
cf pr merge                      # Merge approved PR

# Diagnostics
cf work diagnose <task-id>       # AI-powered analysis of failed tasks

Note: codeframe serve exists but Golden Path does not depend on it.

Frontend (legacy)

cd web-ui && npm test
cd web-ui && npm run build

Do not expand frontend scope during Golden Path work.


Documentation Navigation

Authoritative (v2)

  • docs/GOLDEN_PATH.md - CLI-first workflow contract
  • docs/REFACTOR_PLAN_FOR_AGENT.md - Step-by-step refactor instructions
  • docs/CLI_WIREFRAME.md - Command → module mapping
  • docs/AGENT_IMPLEMENTATION_TASKS.md - Agent system components
  • docs/V2_STRATEGIC_ROADMAP.md - 5-phase plan from CLI to multi-agent

Agent Architecture (Phase 2.5)

  • docs/AGENT_V3_UNIFIED_PLAN.md - ReAct architecture design and rules
  • docs/REACT_AGENT_ARCHITECTURE.md - Deep-dive: tools, editor, token management
  • docs/REACT_AGENT_ANALYSIS.md - Golden path test run analysis

API Documentation (Phase 2)

  • /docs - Swagger UI (interactive API explorer)
  • /redoc - ReDoc (readable API documentation)
  • /openapi.json - OpenAPI 3.1 specification
  • docs/PHASE_2_DEVELOPER_GUIDE.md - Server layer implementation guide
  • docs/PHASE_2_CLI_API_MAPPING.md - CLI to API endpoint mapping

Legacy (v1 reference only)

These describe old server/UI-driven architecture:

  • SPRINTS.md, sprints/
  • specs/
  • CODEFRAME_SPEC.md
  • v1 feature docs (context/session/auth/UI state management)

What NOT to do (common agent failure modes)

  • Don't add new HTTP endpoints to support the CLI
  • Don't require codeframe serve for CLI workflows
  • Don't implement UI concepts (tabs, panels, progress bars) inside core
  • Don't redesign auth, websockets, or UI state management
  • Don't add multi-providers/model switching features before Golden Path works
  • Don't "clean up the repo" as a goal - only refactor to enable Golden Path
  • Don't update task status from agent.py - let runtime handle transitions

Testing / Demoing CodeFRAME on Sample Projects

When running uv run cf commands against a sample project (e.g., cf-test/) to test or demo CodeFRAME's capabilities, you are observing the CodeFRAME agent's work, not doing the work yourself.

Rules for testing/demo mode:

  • You are evaluating how well the CodeFRAME agent (ReAct or Plan engine) builds the project
  • Do NOT help out, fix errors, or write code on behalf of the CodeFRAME agent
  • Do NOT intervene when the agent makes mistakes — that's data
  • Your job is to report the process: what worked, what failed, how close the agent got
  • Document the agent's output, errors encountered, and final state
  • Assess completion against the PRD/acceptance criteria objectively
  • If the agent gets stuck or fails, report that as a finding — don't rescue it

This applies when using commands like cf work start <id> --execute, cf work batch run, or any command that triggers the AI agent to do implementation work on a target project.


Practical Working Mode for Agents

When implementing anything, do this loop:

  1. Read docs/GOLDEN_PATH.md and confirm the change is required
  2. Find the command in docs/CLI_WIREFRAME.md
  3. Implement core functionality in codeframe/core/
  4. Call it from Typer command in codeframe/cli/
  5. Emit events + persist state
  6. Keep it runnable. Commit.

If you are unsure which direction to take, default to:

  • simpler state
  • fewer dependencies
  • smaller surface area
  • core-first, CLI-first

Recent Updates (2026-03-09)

Stall Detection System (#399, #400, #401)

Complete stall detection and configurable recovery for agent execution:

Components:

  • StallMonitor (core/stall_monitor.py) — Thread-based watchdog polling every 5s
  • StallDetector (core/stall_detector.py) — Synchronous time-tracking primitive
  • StallAction enum — Recovery strategy: RETRY, BLOCKER, FAIL
  • StallDetectedError — Exception for RETRY path (propagates to runtime for retry)

CLI flags:

  • --stall-timeout N — Seconds without tool activity before stall (default: 300, 0=disabled)
  • --stall-action {blocker,retry,fail} — Recovery action (default: blocker)
  • Both flags available on cf work start and cf work batch run

Recovery flow:

  • BLOCKER (default): Creates informative blocker, task → BLOCKED
  • RETRY: Raises StallDetectedError, runtime retries once with fresh agent
  • FAIL: Task transitions directly to FAILED

Config: agent_budget.stall_timeout_s in .codeframe/config.yaml (0 = disabled)


Phase 2.5 Complete: ReAct Agent Architecture (#355)

Default execution engine switched from plan-based to ReAct (Reasoning + Acting).

What changed:

  • Default engine is now "react" — all cf work start --execute and cf work batch run commands use ReactAgent
  • Legacy plan engine available via --engine plan flag
  • ReactAgent uses iterative tool-use loop (observe → think → act) instead of plan-all-then-execute
  • 7 structured tools: read_file, edit_file, create_file, run_command, run_tests, search_codebase, list_files
  • Search-replace editing with 4-level fuzzy matching (exact → whitespace-normalized → indentation-agnostic → fuzzy)
  • Token budget management with 3-tier compaction
  • Adaptive iteration budget based on task complexity

Phase 2.5 deliverables:

Phase Focus Pipeline Stage Status
1 CLI Completion Think + Build Complete
2 Server Layer Build (API) Complete
2.5 ReAct Agent Build (execution) Complete
3 Web UI Rebuild All (dashboard) In Progress
4 Agent Adapters + Orchestration Build (delegate to frontier agents) Next
5 PROOF9 + Advanced Prove + Ship (quality memory) Planned

Phase 2 Complete: Server Layer (2026-02-03)

Phase 2 deliverables completed:

Server Architecture (Phase 2)

Pattern: Thin adapter over core - server routes delegate to core.* modules.

CLI (typer) ─┬── core.* ─── adapters.*
             │
Server (fastapi) ─┘

V2 Router Modules (16 total):

Router Endpoints Purpose
blockers_v2 5 Blocker CRUD
prd_v2 8 PRD management + versioning
tasks_v2 12 Task management + streaming
workspace_v2 5 Init, status, tech stack
batches_v2 5 Batch execution strategies
streaming_v2 2 SSE event streaming
api_key_v2 4 API key management
discovery_v2 5 PRD discovery sessions
checkpoints_v2 6 State checkpoints
schedule_v2 3 Task scheduling
templates_v2 4 PRD templates
git_v2 3 Git operations
review_v2 2 Code review
pr_v2 5 GitHub PR workflow
environment_v2 4 Tool detection
proof_v2 7 PROOF9 quality gates + requirements

API Authentication:

# Create API key
cf auth api-key-create --name "my-key" --scopes read,write

# Use in requests
curl -H "X-API-Key: cf_..." https://api.example.com/api/v2/tasks

Rate Limiting:

  • Default: 100 requests/minute (standard endpoints)
  • Auth endpoints: 10/minute
  • AI endpoints: 20/minute
  • Configurable via RATE_LIMIT_* environment variables

OpenAPI Documentation:

  • Swagger UI: /docs
  • ReDoc: /redoc
  • OpenAPI JSON: /openapi.json

Previous Updates (2026-01-29)

V2 Strategic Roadmap Established

Created comprehensive 5-phase roadmap in docs/V2_STRATEGIC_ROADMAP.md.

Phase 1 Complete: CLI Foundation

All Phase 1 priorities completed:

Environment Validation (cf env)

New commands for validating development environment:

cf env check              # Validate required tools (git, uv, ruff, pytest)
cf env install            # Install missing tools automatically
cf env doctor             # Comprehensive environment health check

Modules:

  • core/environment.py - Tool detection and validation
  • core/installer.py - Cross-platform tool installation

GitHub PR Workflow (cf pr)

Streamlined PR management without leaving the CLI:

cf pr create              # Create PR from current branch
cf pr status              # Show PR status and review state
cf pr checks              # Show CI check results
cf pr merge               # Merge approved PR

Task Self-Diagnosis (cf work diagnose)

AI-powered analysis of failed tasks:

cf work diagnose <task-id>   # Analyze why a task failed

Modules:

  • core/diagnostics.py - Failed task analysis
  • core/diagnostic_agent.py - AI-powered diagnosis

Bug Fixes

GitHub Issue Organization


Previous Updates (2026-01-16)

Phase 3.1: Tech Stack Configuration

Simplified tech stack configuration using natural language descriptions:

  1. tech_stack field on Workspace model - stores natural language description
  2. --detect flag - auto-detects from pyproject.toml, package.json, Cargo.toml, go.mod
  3. --tech-stack flag - explicit tech stack description (e.g., "Rust project with cargo")
  4. --tech-stack-interactive flag - simple prompt for user input (stub for future multi-round)
  5. Agent integration - TaskContext and Planner include tech_stack in LLM prompts
  6. Removed cf config subcommand - tech stack is now part of workspace init

Design philosophy: Instead of structured configuration with specific package managers and frameworks, users describe their stack in natural language. The agent interprets and adapts.

Examples:

cf init . --detect                           # Auto-detect: "Python with uv, pytest, ruff for linting"
cf init . --tech-stack "Rust project using cargo"
cf init . --tech-stack "TypeScript monorepo with pnpm, Next.js, jest"
cf init . --tech-stack-interactive           # Prompts user for description

Future work: Multi-round interactive discovery (bead: codeframe-8d80)


Agent Self-Correction & Observability

Improved agent reliability with automatic error recovery:

  1. Self-correction loop in _run_final_verification() - agent retries up to 3 times
  2. Verbose mode (--verbose / -v) - shows detailed verification/self-correction progress
  3. FAILED task status - tasks transition to FAILED for proper error visibility
  4. Project preferences - agent loads AGENTS.md/CLAUDE.md for per-project config
  5. Fixed fail_run() - now properly transitions task status (was leaving tasks stuck)

Enhanced Self-Correction (Phase 3.4)

Advanced error recovery with loop prevention and smart escalation:

  1. Fix Attempt Tracker (core/fix_tracker.py) - prevents repeating failed fixes

    • Normalizes errors for comparison (removes line numbers, memory addresses)
    • Tracks (error_signature, fix_description) pairs with outcomes
    • Detects escalation patterns (same error 3+ times, same file 3+ times)
  2. Pattern-Based Quick Fixes (core/quick_fixes.py) - fixes common errors without LLM

    • ModuleNotFoundError → auto-install package (detects package manager)
    • ImportError → add missing import statement
    • NameError → add common imports (Optional, dataclass, Path, etc.)
    • SyntaxError → fix missing colons, f-string prefixes
    • IndentationError → normalize mixed tabs/spaces
  3. Escalation to Blocker - creates informative blockers when stuck

    • Triggered after MAX_SAME_ERROR_ATTEMPTS (3) failures on same error
    • Triggered after MAX_SAME_FILE_ATTEMPTS (3) failures on same file
    • Triggered after MAX_TOTAL_FAILURES (5) in a run
    • Blocker includes error type, attempted fixes, and guidance questions

Self-Correction Flow

Error occurs
    │
    ├── Try ruff --fix (auto-lint)
    │
    ├── Try pattern-based quick fix (no LLM)
    │   ├── Check if fix already attempted → skip
    │   ├── Apply fix
    │   └── Record outcome in tracker
    │
    ├── Check escalation threshold
    │   └── If exceeded → create escalation blocker
    │
    └── Use LLM to generate fix plan
        ├── Include already-tried fixes to avoid repetition
        ├── Execute fix steps with tracking
        └── Re-verify

Key Self-Correction Methods

  • _run_final_verification(): While loop that re-runs gates after self-correction
  • _attempt_verification_fix(): Orchestrates quick fixes, escalation check, LLM fixes
  • _create_escalation_blocker(): Creates detailed blocker with context
  • _verbose_print(): Conditional stdout output for observability

Phase 2 Complete (2026-01-15): Parallel Batch Execution

All 6 Phase 2 items from CLI_WIREFRAME.md are done:

  1. work batch resume <batch-id> - re-run failed/blocked tasks
  2. depends_on field on Task model
  3. ✅ Dependency graph analysis (DAG, cycle detection, topological sort)
  4. ✅ True parallel execution with ThreadPoolExecutor worker pool
  5. --strategy auto with LLM-based dependency inference
  6. --retry N automatic retry of failed tasks

Key Phase 2 Modules

  • conductor.py: Batch orchestration with serial/parallel/auto strategies
  • dependency_graph.py: DAG operations, level-based grouping for parallelization
  • dependency_analyzer.py: LLM analyzes task descriptions to infer dependencies

Agent Implementation Complete (2026-01-14)

All 8 implementation tasks from AGENT_IMPLEMENTATION_TASKS.md are done:

  1. ✅ LLM Adapter Interface (adapters/llm/)
  2. ✅ Task Context Loader (core/context.py)
  3. ✅ Agent Planning (core/planner.py)
  4. ✅ Code Execution Engine (core/executor.py)
  5. ✅ Automatic Blocker Detection (in core/agent.py)
  6. ✅ Gate Integration (in core/agent.py)
  7. ✅ Agent Orchestrator (core/agent.py)
  8. ✅ Wire into Runtime (core/runtime.py)

Bug Fixes During Testing

  • GateResult attribute access: Fixed gate_result.statusgate_result.passed
  • Duplicate task transitions: Removed task status updates from agent.py (runtime handles all)
  • READY→READY error: Added check in stop_run before transitioning
  • Verification step handling: Made _execute_verification smarter about file vs command targets

Key Design Decisions

  • State separation: Agent manages AgentState, Runtime manages TaskStatus
  • Model selection: Task-based heuristic via Purpose enum
  • Blocker creation: Agent creates blockers, Runtime updates task status
  • Verification: Incremental (ruff after each file change) + final (all gates)

Testing

Run all tests

uv run pytest

Run v2 tests only

uv run pytest -m v2           # All v2 tests (~411 tests)
uv run pytest -m v2 -q        # Quiet mode

The v2 marker identifies tests for CLI-first, headless functionality:

  • All tests in tests/core/ are automatically marked v2 (via conftest.py)
  • v2 CLI tests have pytestmark = pytest.mark.v2 at the top

Convention: When adding new v2 functionality, mark tests with @pytest.mark.v2 or add pytestmark = pytest.mark.v2 at module level for CLI tests that use codeframe.cli.app.

Run core module tests

uv run pytest tests/core/
uv run pytest tests/core/test_agent.py -v
uv run pytest tests/adapters/test_llm.py -v

Test coverage

uv run pytest --cov=codeframe --cov-report=html

Environment Variables

# Required for agent execution
ANTHROPIC_API_KEY=sk-ant-...

# Optional - Database
DATABASE_PATH=./codeframe.db

# Optional - Rate Limiting (Phase 2)
RATE_LIMIT_ENABLED=true                    # Enable/disable rate limiting
RATE_LIMIT_DEFAULT=100/minute              # Default limit
RATE_LIMIT_AUTH=10/minute                  # Auth endpoints
RATE_LIMIT_AI=20/minute                    # AI/LLM endpoints
RATE_LIMIT_WEBSOCKET=50/minute             # WebSocket connections
REDIS_URL=redis://localhost:6379           # Redis for distributed rate limiting (optional)

# Optional - API Server
CODEFRAME_API_KEY_SECRET=<random-secret>   # Secret for API key hashing

Legacy sections removed on purpose

This file previously contained extensive v1 details (auth, websocket, UI template, sprint history).
Those are still in git history and legacy docs, but they are not the current contract.

The current contract is Golden Path + Refactor Plan + Command Tree mapping + Agent Implementation.

@claude
Copy link
Copy Markdown

claude bot commented Mar 24, 2026

Consolidated Code Review for PR 497 (smart-view event grouping in EventStream.tsx)

CRITICAL (3 issues):

  1. Type guards: isReadEvent and isEditEvent cast through structural types with optional message rather than narrowing to ProgressEvent (which already has non-optional message in useTaskStream.ts). Fix: return e is ProgressEvent type predicates to eliminate downstream casts in groupEvents.
  2. Unguarded computation: groupEvents runs unconditionally on every render even when showAll is true and groups is never consumed. Fix: wrap in useMemo with displayEvents as the dependency.
  3. Fragile layout: h-[calc(100%-37px)] hardcodes the header pixel height. Fix: use flex-1 on the scroll container.

IMPORTANT (4 issues):
4. Dropped callback: EventItem inside ReadGroupRow omits onBlockerAnswered. A blocker event inside a read group silently loses the answer handler. Thread the prop through.
5. Unstable keys: rg-i keys use array index so React unmounts ReadGroupRow on every streaming update. Fix: use the first event timestamp as the key.
6. Inconsistent flush: flushEdit degrades single-item buffers to plain EventItem; flushRead always emits read_group. A lone read renders as a collapsed group. Make them consistent.
7. Unbounded file list in edit groups: ReadGroupRow truncates at 4 files, EditGroupRow does not. Apply the same limit.

MINOR: extractFilename fallback exposes raw message strings in file lists; formatTime is duplicated from EventItem.tsx; groupEvents has no unit tests despite being a testable pure function; smart/raw toggle missing aria-pressed attribute.

CONTEXT: web-ui/ is legacy/reference-only per CLAUDE.md and will be replaced in Phase 3. No core architecture violations in this PR.

MERGE BLOCKERS: issues 2 (memoize groupEvents), 4 (onBlockerAnswered), and 5 (stable keys).

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@web-ui/src/lib/eventStyles.ts`:
- Around line 85-87: The new export editGroupBadgeStyles uses blue color classes
which violates the repo's gray-only UI requirement; update the
editGroupBadgeStyles constant to use the Shadcn/UI Nova preset gray badge
classes (replace blue-* and dark:blue-* classes with the equivalent gray-* and
dark:gray-* classes) so it conforms to web-ui/src/**/*.{ts,tsx} gray color
scheme and Hugeicons conventions; locate the editGroupBadgeStyles export to make
this change.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: dfcd913e-c5a3-4385-abc0-1d4ac6a4a039

📥 Commits

Reviewing files that changed from the base of the PR and between a81ab87 and b7906fb.

📒 Files selected for processing (2)
  • web-ui/src/components/execution/EventStream.tsx
  • web-ui/src/lib/eventStyles.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • web-ui/src/components/execution/EventStream.tsx

Comment on lines +85 to +87
/** Badge style for the edit-group summary row in the EventStream. */
export const editGroupBadgeStyles = 'bg-blue-100 text-blue-800 dark:bg-blue-900/30 dark:text-blue-300';

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Use gray badge classes for the new edit-group style constant.

Line 86 introduces blue badge colors, which conflicts with the repo’s gray-only UI color scheme requirement for web-ui/src/**/*.{ts,tsx}.

🎨 Suggested change
-export const editGroupBadgeStyles = 'bg-blue-100 text-blue-800 dark:bg-blue-900/30 dark:text-blue-300';
+export const editGroupBadgeStyles = 'bg-gray-100 text-gray-800 dark:bg-gray-800 dark:text-gray-300';

As per coding guidelines, web-ui/src/**/*.{ts,tsx}: Web UI must use Shadcn/UI (Nova preset) with gray color scheme and Hugeicons (@hugeicons/react); never use lucide-react.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
/** Badge style for the edit-group summary row in the EventStream. */
export const editGroupBadgeStyles = 'bg-blue-100 text-blue-800 dark:bg-blue-900/30 dark:text-blue-300';
/** Badge style for the edit-group summary row in the EventStream. */
export const editGroupBadgeStyles = 'bg-gray-100 text-gray-800 dark:bg-gray-800 dark:text-gray-300';
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web-ui/src/lib/eventStyles.ts` around lines 85 - 87, The new export
editGroupBadgeStyles uses blue color classes which violates the repo's gray-only
UI requirement; update the editGroupBadgeStyles constant to use the Shadcn/UI
Nova preset gray badge classes (replace blue-* and dark:blue-* classes with the
equivalent gray-* and dark:gray-* classes) so it conforms to
web-ui/src/**/*.{ts,tsx} gray color scheme and Hugeicons conventions; locate the
editGroupBadgeStyles export to make this change.

@claude
Copy link
Copy Markdown

claude bot commented Mar 24, 2026

Follow-up review test

@claude
Copy link
Copy Markdown

claude bot commented Mar 24, 2026

test line 1

@claude
Copy link
Copy Markdown

claude bot commented Mar 24, 2026

Part 1 of review: Two items resolved in b7906fb. groupEvents wrapped in useMemo (issue 2). editGroupBadgeStyles extracted to eventStyles.ts.

@claude
Copy link
Copy Markdown

claude bot commented Mar 24, 2026

--- ## Follow-up review

@claude
Copy link
Copy Markdown

claude bot commented Mar 24, 2026


Follow-up review for PR 497

Two items from the previous review have been addressed in the latest commit (b7906fb):

Resolved:

  • groupEvents is now wrapped in useMemo - eliminates the redundant recompute on every render (was issue 2)
  • editGroupBadgeStyles extracted to eventStyles.ts - matches the dark-mode opacity used by agentStateBadgeStyles

Still outstanding from prior review - in priority order:

Blocking / functional:

  1. onBlockerAnswered silently dropped (issue 4) - ReadGroupRow passes no onBlockerAnswered prop to its EventItem children. If a blocker event arrives inside a read group the callback is never called. Fix: thread the prop through ReadGroupRow.

  2. Unstable keys reset expand state (issue 5) - Keys rg-N and eg-N are index-based. During live streaming the group list shifts as new events arrive, causing React to unmount and remount ReadGroupRow which collapses any row the user just opened. Use group.timestamp (or the first event timestamp for read_group) as the key instead.

Quality / consistency:

  1. Type assertions still present (issue 1) - the structural cast pattern appears in isReadEvent, isEditEvent, and groupEvents. ExecutionEvent is a union - narrow to the ProgressEvent member with a proper type guard instead of the structural cast.

  2. Hardcoded 37px header height (issue 3) - h-[calc(100%-37px)] will break if padding or font size changes. Replace with a flex column layout (flex flex-col on the outer wrapper, shrink-0 on the header, flex-1 min-h-0 on the scroll container).

  3. flushRead / flushEdit inconsistency (issue 6) - flushEdit degrades to a plain EventItem for a single-item buffer; flushRead always emits a read_group regardless. Apply the same single-event pass-through to flushRead for consistency.

  4. EditGroupRow file list not truncated (issue 7) - ReadGroupRow clips the label at 4 files; EditGroupRow renders all of them. Cap EditGroupRow at the same limit (e.g. show first 3 and +N more).

Minor:

  1. aria-pressed missing on view toggle (issue 11) - The smart/raw toggle button should have aria-pressed equal to the showAll state so screen readers can communicate its state.

  2. formatTime duplication (issue 9) - The inline toLocaleTimeString pattern is repeated in both ReadGroupRow and EditGroupRow. The same formatting already exists in EventItem.tsx - extract a shared utility.


Items 1 and 2 (onBlockerAnswered and stable keys) are the highest-priority blockers before merge as they affect correctness in live-streaming scenarios.

@claude
Copy link
Copy Markdown

claude bot commented Mar 24, 2026


Follow-up review — PR #497

Two items from the previous review have been addressed in the latest commit (b7906fb):

Resolved


Still outstanding from prior review

The following items flagged before were not addressed. Listing them in priority order:

Blocking / functional

  1. onBlockerAnswered silently dropped (issue feat: Project schema refactoring with source types and workspace management #4) — ReadGroupRow passes no onBlockerAnswered prop to its EventItem children. If a blocker event arrives inside a read group the callback is never called. Fix: thread the prop through ReadGroupRow.

  2. Unstable keys reset expand state (issue feat: Project schema refactoring with API endpoint integration #5) — Keys rg-N and eg-N are index-based. During live streaming the group list shifts as new events arrive, causing React to unmount and remount ReadGroupRow which collapses any row the user just opened. Use group.timestamp (or the first event timestamp for read_group) as the key instead.

Quality / consistency

  1. Type assertions still present (issue Add Claude Code GitHub Workflow #1) — the structural cast pattern appears in isReadEvent, isEditEvent, and groupEvents. ExecutionEvent is a union — narrow to the ProgressEvent member with a proper type guard instead of the structural cast.

  2. Hardcoded 37px header height (issue feat(sprint-4): Multi-Agent Coordination System - P0 & P1 Complete #3) — h-[calc(100%-37px)] will break if padding or font size changes. Replace with a flex column layout (flex flex-col on the outer wrapper, shrink-0 on the header, flex-1 min-h-0 on the scroll container).

  3. flushRead / flushEdit inconsistency (issue Project Schema Refactoring - Flexible Source Types & Deployment Modes #6) — flushEdit degrades to a plain EventItem for a single-item buffer; flushRead always emits a read_group regardless. Apply the same single-event pass-through to flushRead for consistency.

  4. EditGroupRow file list not truncated (issue Improve workspace cleanup in project creation rollback #7) — ReadGroupRow clips the label at 4 files; EditGroupRow renders all of them. Cap EditGroupRow at the same limit (e.g. show first 3 and +N more).

Minor

  1. aria-pressed missing on view toggle (issue feat: Convert worker agents to async/await pattern (cf-48) #11) — The smart/raw toggle button should have aria-pressed equal to the showAll state so screen readers can communicate its state.

  2. formatTime duplication (issue Merge Sprint 4 Multi-Agent Coordination work into main #9) — The inline toLocaleTimeString pattern is repeated in both ReadGroupRow and EditGroupRow. The same formatting already exists in EventItem.tsx — extract a shared utility.


Items #1 and #2 (onBlockerAnswered and stable keys) are the highest-priority blockers before merge as they affect correctness in live-streaming scenarios.

@frankbria frankbria merged commit e1296e7 into main Mar 24, 2026
13 checks passed
@frankbria frankbria deleted the feat/issue-475-compress-event-stream branch March 24, 2026 23:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

UX: Compress and summarize verbose execution event stream

1 participant