feat(web-ui): compress and summarize verbose execution event stream (#475) by frankbria · Pull Request #497 · frankbria/codeframe

frankbria · 2026-03-24T18:31:49Z

Summary

groupEvents() transforms raw event arrays into grouped EventGroup[] for the smart view
Consecutive file-read events → collapsible ReadGroupRow ("Read N files: x, y, z") with expand affordance
Consecutive file edit/create/delete events → EditGroupRow single summary line ("Modified N files: …")
Smart view (default) renders grouped events; "Show all events" toggle reveals every raw event unmodified
Stream header added with view toggle button
All other event types (planning, verification, blockers, completion) pass through unchanged

Closes #475

Test plan

Smart view shows "Read N files" collapsed rows for consecutive reads
Clicking expand row reveals individual read events
Consecutive file edits show single "Modified N files" row
"Show all events" toggle shows every event individually
"Smart view" toggle returns to grouped view
Non-read/non-edit events render unchanged
Build passes clean

Summary by CodeRabbit

New Features
- Implemented intelligent grouping of related read and edit operations in the event stream for improved clarity.
- Added a header with a view toggle to switch between grouped summary view (default) and a detailed view of all individual events.
- Collapsible read operation summaries provide quick overviews while maintaining access to full details.
- Added a compact badge-style summary for edit groups to improve scannability.

…475) - groupEvents() function transforms raw events into EventGroup[] for smart view - Consecutive file-read events → collapsible ReadGroupRow ("Read N files") - Consecutive file edit/create/delete events → EditGroupRow summary line - Smart view is default; "Show all events" toggle switches to raw log - ReadGroupRow expands to show individual events on click - Added stream header with event count context and view toggle

coderabbitai · 2026-03-24T18:32:10Z

Walkthrough

Adds event grouping and UI controls to the execution event stream: filters heartbeat events, groups consecutive progress events into read and edit summaries, introduces collapsible read groups and summarized edit rows, a smart/raw view toggle, an icon import, and a small scroll-height adjustment.

Changes

Cohort / File(s)	Summary
Event Stream Smart Grouping `web-ui/src/components/execution/EventStream.tsx`	Added `useMemo`-based filtering and `groupEvents` pipeline, introduced `EventGroup` types and helpers (`extractFilename`, `isReadEvent`, `isEditEvent`), added `ReadGroupRow` (collapsible) and `EditGroupRow` (summary) subcomponents, integrated `ArrowRight01Icon`, added `showAll` header toggle to switch smart vs raw views, and adjusted scroll container height.
Event Styles `web-ui/src/lib/eventStyles.ts`	Exported new `editGroupBadgeStyles` Tailwind class string for styling the edit-group summary row; other style exports unchanged.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 I nibble logs and weave them tight,
Reads tucked in folds, edits shining bright,
A little arrow nudges doors ajar,
Smart and raw — both near, both far,
Hooray for streams that hop and light the night!

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Linked Issues check	⚠️ Warning	The PR implements 3 of 5 core acceptance criteria from `#475`: collapsed expandable read events, grouped edit events, and 'Show all events' toggle. Phase checkpoints and progress indicator are not implemented.	Implement the remaining acceptance criteria: phase progress checkpoints at natural milestones and overall progress indicators to fully address issue `#475`.
Docstring Coverage	⚠️ Warning	Docstring coverage is 22.22% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately reflects the main change: compressing and summarizing the verbose event stream with grouped events and a smart view toggle.
Out of Scope Changes check	✅ Passed	All changes are directly related to compressing and grouping the event stream. The new grouping utilities, UI components, and styling additions align with PR objectives and requirements from issue `#475`.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/issue-475-compress-event-stream

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (4)

web-ui/src/components/execution/EventStream.tsx (4)

198-203: Use Shadcn/UI Button component for consistency.

The toggle button uses a plain <button> element while the "New events" button (line 258) uses the Shadcn/UI Button component. For consistency and to leverage Shadcn/UI's built-in accessibility and styling, consider using Button with appropriate variant.

♻️ Suggested change

-        <button
-          className="text-xs text-muted-foreground hover:text-foreground"
+        <Button
+          variant="ghost"
+          size="sm"
+          className="h-auto px-2 py-1 text-xs text-muted-foreground hover:text-foreground"
           onClick={() => setShowAll((v) => !v)}
         >
           {showAll ? 'Smart view' : 'Show all events'}
-        </button>
+        </Button>

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@web-ui/src/components/execution/EventStream.tsx` around lines 198 - 203, The
toggle uses a raw <button> instead of the Shadcn/UI Button component—replace the
plain element in EventStream.tsx with the Button component (imported from your
shadcn/ui library) and pass the same onClick handler (setShowAll) and display
text (showAll ? 'Smart view' : 'Show all events'), applying an appropriate
variant/size (e.g., variant="ghost" or "link"/size="sm") and the equivalent
className props to preserve styling and accessibility; ensure the import for
Button is added/updated and that the component still toggles setShowAll((v) =>
!v) and uses the existing showAll symbol for the label.

27-35: Type assertions bypass type safety.

The repeated (e as { message?: string }).message pattern sidesteps TypeScript's type checking. Since ExecutionEvent is a union type and you've already narrowed to event_type === 'progress', consider defining a type guard or using proper type narrowing to access message safely.

♻️ Suggested approach using type guard

+function isProgressEventWithMessage(e: ExecutionEvent): e is ExecutionEvent & { message: string } {
+  return e.event_type === 'progress' && typeof (e as { message?: string }).message === 'string';
+}
+
 function isReadEvent(e: ExecutionEvent): boolean {
-  if (e.event_type !== 'progress') return false;
-  return /^reading file:/i.test((e as { message?: string }).message ?? '');
+  return isProgressEventWithMessage(e) && /^reading file:/i.test(e.message);
 }
 
 function isEditEvent(e: ExecutionEvent): boolean {
-  if (e.event_type !== 'progress') return false;
-  return /^(creating|editing|deleting) file:/i.test((e as { message?: string }).message ?? '');
+  return isProgressEventWithMessage(e) && /^(creating|editing|deleting) file:/i.test(e.message);
 }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@web-ui/src/components/execution/EventStream.tsx` around lines 27 - 35, The
two helpers isReadEvent and isEditEvent use unsafe type assertions for message;
define a proper type guard (e.g., isProgressEvent(e): e is ProgressEvent or a
narrowed type that includes message?: string) that checks e.event_type ===
'progress' and that typeof e.message === 'string' (or message existence), then
replace the inline assertions in isReadEvent and isEditEvent to call that type
guard so you can safely access e.message without bypassing TypeScript's checks.

206-213: Consider flex layout to avoid hardcoded height.

The hardcoded 37px in h-[calc(100%-37px)] is fragile and will break if the header's padding or font size changes. A flex-based approach would be more resilient.

♻️ Alternative using flex layout

-    <div className="relative flex-1 overflow-hidden rounded-lg border">
+    <div className="relative flex flex-1 flex-col overflow-hidden rounded-lg border">
       {/* Header: stream label + view toggle */}
-      <div className="flex items-center justify-between border-b px-4 py-2">
+      <div className="flex shrink-0 items-center justify-between border-b px-4 py-2">
         ...
       </div>

       <div
         ref={containerRef}
-        className="h-[calc(100%-37px)] overflow-y-auto p-4"
+        className="min-h-0 flex-1 overflow-y-auto p-4"

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@web-ui/src/components/execution/EventStream.tsx` around lines 206 - 213, The
container currently uses a hardcoded height via className "h-[calc(100%-37px)]"
which is brittle; change the layout to flex by making the parent wrapper a flex
column and replace the container's fixed height with a flex-grow class (e.g.,
flex-1 or flex-auto) so the scrollable area grows to fill remaining space.
Locate the EventStream component and update the element that uses ref
containerRef and onScroll={handleScroll} to remove the calc height and use the
flex-based overflow class (overflow-auto/overflow-y-auto) so the header and
content sizing is resilient to padding/font changes. Ensure the sibling header
element remains above the scroll container in the same flex column so the stack
order and accessibility attributes (role="log", aria-live, aria-label) are
preserved.

17-20: Consider moving EventGroup type to centralized types file.

Per coding guidelines, TypeScript types should be defined in web-ui/src/types/index.ts. While this type is currently internal to this component, centralizing it would align with project conventions and facilitate reuse if other components need to work with grouped events.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@web-ui/src/components/execution/EventStream.tsx` around lines 17 - 20, Move
the EventGroup type definition out of EventStream.tsx into the project's central
types index and export it so other modules can reuse it; update EventStream.tsx
to import the exported EventGroup (and ensure ExecutionEvent is imported or
re-exported if needed), keeping the same union shape ({ type: 'event' |
'read_group' | 'edit_group' } etc.) and run a quick type-check to fix any
import/exports that need adjusting.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@web-ui/src/components/execution/EventStream.tsx`:
- Line 154: The call to groupEvents(displayEvents) in EventStream.tsx is
recomputing on every render; wrap it with React's useMemo and add useMemo to the
imports so the grouping only recalculates when displayEvents changes. Locate the
constant groups = groupEvents(displayEvents) and replace it with a memoized
value using useMemo(() => groupEvents(displayEvents), [displayEvents]) and
ensure useMemo is imported at the top alongside other React hooks.
- Around line 119-133: The edit badge in EditGroupRow uses dark:bg-blue-900/40
which is inconsistent with the other badge styles (agentStateBadgeStyles uses
dark:bg-blue-900/30); extract the badge class string into a shared constant
(e.g., export const editGroupBadgeStyles) in the eventStyles module and replace
the inline class on the "edit" span with that constant
(className={editGroupBadgeStyles}), making sure the constant uses
dark:bg-blue-900/30 to match agentStateBadgeStyles for consistency and
maintainability.

---

Nitpick comments:
In `@web-ui/src/components/execution/EventStream.tsx`:
- Around line 198-203: The toggle uses a raw <button> instead of the Shadcn/UI
Button component—replace the plain element in EventStream.tsx with the Button
component (imported from your shadcn/ui library) and pass the same onClick
handler (setShowAll) and display text (showAll ? 'Smart view' : 'Show all
events'), applying an appropriate variant/size (e.g., variant="ghost" or
"link"/size="sm") and the equivalent className props to preserve styling and
accessibility; ensure the import for Button is added/updated and that the
component still toggles setShowAll((v) => !v) and uses the existing showAll
symbol for the label.
- Around line 27-35: The two helpers isReadEvent and isEditEvent use unsafe type
assertions for message; define a proper type guard (e.g., isProgressEvent(e): e
is ProgressEvent or a narrowed type that includes message?: string) that checks
e.event_type === 'progress' and that typeof e.message === 'string' (or message
existence), then replace the inline assertions in isReadEvent and isEditEvent to
call that type guard so you can safely access e.message without bypassing
TypeScript's checks.
- Around line 206-213: The container currently uses a hardcoded height via
className "h-[calc(100%-37px)]" which is brittle; change the layout to flex by
making the parent wrapper a flex column and replace the container's fixed height
with a flex-grow class (e.g., flex-1 or flex-auto) so the scrollable area grows
to fill remaining space. Locate the EventStream component and update the element
that uses ref containerRef and onScroll={handleScroll} to remove the calc height
and use the flex-based overflow class (overflow-auto/overflow-y-auto) so the
header and content sizing is resilient to padding/font changes. Ensure the
sibling header element remains above the scroll container in the same flex
column so the stack order and accessibility attributes (role="log", aria-live,
aria-label) are preserved.
- Around line 17-20: Move the EventGroup type definition out of EventStream.tsx
into the project's central types index and export it so other modules can reuse
it; update EventStream.tsx to import the exported EventGroup (and ensure
ExecutionEvent is imported or re-exported if needed), keeping the same union
shape ({ type: 'event' | 'read_group' | 'edit_group' } etc.) and run a quick
type-check to fix any import/exports that need adjusting.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 86312a49-7763-4091-b0a7-5dad2f4422d3

📥 Commits

Reviewing files that changed from the base of the PR and between 814f550 and a81ab87.

📒 Files selected for processing (1)

web-ui/src/components/execution/EventStream.tsx

web-ui/src/components/execution/EventStream.tsx

claude · 2026-03-24T18:37:09Z

test

claude · 2026-03-24T18:38:13Z

PLACEHOLDER

claude · 2026-03-24T18:38:57Z

Part 1 of code review for PR #497 (EventStream.tsx smart-view grouping): CRITICAL ISSUES — (1) isReadEvent/isEditEvent cast through structural types with optional message instead of narrowing to the existing ProgressEvent type. These should be proper type guards returning 'e is ProgressEvent', eliminating repeated casts inside groupEvents. (2) groupEvents runs on every render regardless of whether showAll is true, wasting work when groups is never consumed. Wrap in useMemo with displayEvents as the dependency. (3) h-[calc(100%-37px)] hardcodes the header height; use flex-1 instead.

claude · 2026-03-24T18:39:05Z

Part 2 of code review — IMPORTANT ISSUES: (4) onBlockerAnswered is not threaded into ReadGroupRow children — EventItem inside ReadGroupRow is rendered without this prop, so if a blocker event lands inside a read group the answer callback is silently dropped. (5) Index-based keys like rg-i reset expanded state during live streaming as React unmounts/remounts ReadGroupRow components when the group list shifts. Use the first event timestamp as key. (6) flushRead always emits a read_group even for a single event, unlike flushEdit which degrades to a plain EventItem. Make these consistent. (7) EditGroupRow renders the full file list without truncation; ReadGroupRow clips at 4. Apply the same limit.

claude · 2026-03-24T18:39:13Z

Part 3 of code review — MINOR + CONTEXT: Minor issues: extractFilename fallback lets full raw message strings appear in file lists (return trimmed placeholder instead); formatTime is duplicated from EventItem.tsx inline in the new row components; groupEvents has no automated tests despite being a pure function with non-trivial buffer logic; the smart/raw toggle is missing aria-pressed so screen readers cannot communicate its state. Context: per CLAUDE.md web-ui/ is legacy/reference-only and will be replaced in Phase 3. This PR does not violate core architectural rules. Blocking items before merge: issue 2 (memoize groupEvents), issue 4 (onBlockerAnswered in ReadGroupRow), and issue 5 (stable keys). The rest are quality improvements.

claude · 2026-03-24T18:39:20Z

CodeFRAME Development Guidelines

Last updated: 2026-03-09

Product Vision

CodeFrame is a project delivery system: Think → Build → Prove → Ship.

It owns the edges of the AI coding pipeline — everything BEFORE code gets written (PRD, specification, task decomposition) and everything AFTER (verification gates, quality memory, deployment). The actual code writing is delegated to frontier coding agents (Claude Code, Codex, OpenCode) that are better at it than any custom agent.

CodeFrame does not compete with coding agents. It orchestrates them.

THINK:  cf prd generate → cf prd stress-test → cf tasks generate
BUILD:  cf work start --engine claude-code  (or codex, opencode, built-in)
PROVE:  cf proof run  (9-gate evidence-based quality system)
SHIP:   cf pr create → cf pr merge
LOOP:   Glitch → cf proof capture → New REQ → Enforced forever

Status: Phase 1 ✅ | Phase 2 ✅ | Phase 2.5 ✅ — CLI workflow, server layer, and ReAct agent complete. Agent adapter architecture (#408) and PROOF9 quality system (#422) are next priorities. See docs/V2_STRATEGIC_ROADMAP.md for the full plan.

If you are an agent working in this repo: do not improvise architecture. Follow the documents listed below.

Primary Contract (MUST FOLLOW)

Golden Path: docs/GOLDEN_PATH.md
The only workflow we build until it works end-to-end.
Refactor Plan: docs/REFACTOR_PLAN_FOR_AGENT.md
Step-by-step refactor instructions.
Command Tree + Module Mapping: docs/CLI_WIREFRAME.md
The authoritative map from CLI commands → core modules/functions.
Agent Implementation: docs/AGENT_IMPLEMENTATION_TASKS.md
Tracks the agent system components (all complete).
Strategic Roadmap: docs/V2_STRATEGIC_ROADMAP.md
5-phase plan from CLI to multi-agent.

Rule 0: If a change does not directly support the Think → Build → Prove → Ship pipeline, do not implement it.

Strategic Priority (Phase 4)

The next major architectural work is the Agent Adapter Architecture (#408):

Define AgentAdapter protocol so any coding agent can be an execution engine
CodeFrame's built-in ReactAgent becomes the fallback, not the primary
Verification gates and self-correction wrap ALL engines uniformly
See issues [Phase 4] Agent Adapter Architecture: Delegate to Frontier Coding Agents #408-[Phase 4] Kilocode Engine Adapter (VS Code Extension Protocol) #417 for the full breakdown

Current Reality (Phase 1, 2 & 2.5 Complete)

What's Working Now

Full agent execution: cf work start <task-id> --execute (uses ReAct engine by default)
Engine selection: --engine react (default) or --engine plan (legacy)
Verbose mode: cf work start <task-id> --execute --verbose shows detailed progress
Dry run mode: cf work start <task-id> --execute --dry-run
Self-correction loop: Agent automatically fixes failing verification gates (up to 5 attempts with ReAct)
FAILED task status: Tasks can transition to FAILED for proper error visibility
Tech stack configuration: cf init . --detect auto-detects tech stack from project files
Project preferences: Agent loads AGENTS.md or CLAUDE.md for per-project configuration
Stall detection: Thread-based monitor with configurable recovery (--stall-action blocker|retry|fail)
Blocker detection: Agent creates blockers when stuck
Verification gates: Ruff/pytest checks after file changes
State persistence: Pause/resume across sessions
Batch execution: cf work batch run with serial/parallel/auto strategies
Task dependencies: depends_on field with dependency graph analysis
LLM dependency inference: --strategy auto analyzes task descriptions
Automatic retry: --retry N for failed task recovery
Batch resume: Re-run failed/blocked tasks from previous batches
Task scheduling: cf schedule show/predict/bottlenecks with CPM-based scheduling
Task templates: cf templates list/show/apply with 7 builtin templates
Effort estimation: Tasks support estimated_hours field for scheduling
Environment validation: cf env check/install/doctor validates tools and dependencies
GitHub PR workflow: cf pr create/status/checks/merge for PR management
Task self-diagnosis: cf work diagnose <task-id> analyzes failed tasks
70+ integration tests: Comprehensive CLI test coverage
REST API: Full v2 API with 16 router modules (see Phase 2 below)
API authentication: API key auth with scopes (read/write/admin)
Rate limiting: Configurable per-endpoint rate limits
Real-time streaming: SSE for task execution events
OpenAPI documentation: Full Swagger/ReDoc at /docs and /redoc

v2 Architecture (current)

Core-first: Domain logic lives in codeframe/core/ (headless, no FastAPI imports)
CLI-first: Golden Path works without any running FastAPI server
Adapters: LLM providers in codeframe/adapters/llm/
Server/UI optional: FastAPI and UI are thin adapters over core

v1 Legacy

FastAPI server + WebSockets + React/Next.js dashboard retained for reference
Do not build toward v1 patterns during Golden Path work

Repository Structure

codeframe/
├── core/                    # Headless domain + orchestration (NO FastAPI imports)
│   ├── react_agent.py      # ReAct agent (default engine) - observe-think-act loop
│   ├── tools.py            # Tool definitions for ReAct agent (7 tools)
│   ├── editor.py           # Search-replace file editor with fuzzy matching
│   ├── agent.py            # Legacy plan-based agent (--engine plan)
│   ├── planner.py          # LLM-powered implementation planning (plan engine)
│   ├── executor.py         # Code execution engine with rollback (plan engine)
│   ├── context.py          # Task context loader with relevance scoring
│   ├── tasks.py            # Task management with depends_on field
│   ├── blockers.py         # Human-in-the-loop blocker system
│   ├── runtime.py          # Run lifecycle management
│   ├── conductor.py        # Batch orchestration with worker pool
│   ├── dependency_graph.py # DAG operations and execution planning
│   ├── dependency_analyzer.py # LLM-based dependency inference
│   ├── gates.py            # Verification gates (ruff, pytest, BUILD)
│   ├── fix_tracker.py      # Fix attempt tracking for loop prevention
│   ├── quick_fixes.py      # Pattern-based fixes without LLM
│   ├── agents_config.py    # AGENTS.md/CLAUDE.md preference loading
│   ├── workspace.py        # Workspace initialization
│   ├── prd.py              # PRD management
│   ├── events.py           # Event emission
│   ├── state_machine.py    # Task status transitions
│   ├── environment.py      # Environment validation and tool detection
│   ├── installer.py        # Automatic tool installation
│   ├── diagnostics.py      # Failed task analysis
│   ├── diagnostic_agent.py # AI-powered task diagnosis
│   ├── credentials.py      # API key and credential management
│   ├── stall_detector.py   # Synchronous stall detector + StallAction enum + StallDetectedError
│   ├── stall_monitor.py    # Thread-based stall watchdog with callback
│   ├── streaming.py        # Real-time output streaming for cf work follow
│   └── ...
├── adapters/
│   └── llm/                # LLM provider adapters
│       ├── base.py         # Protocol + ModelSelector + Purpose enum
│       ├── anthropic.py    # Anthropic Claude provider
│       └── mock.py         # Mock provider for testing
├── cli/
│   └── app.py              # Typer CLI entry + subcommands
├── ui/                     # FastAPI server (Phase 2 - thin adapter over core)
│   ├── server.py           # FastAPI app with OpenAPI configuration
│   ├── models.py           # Pydantic request/response models
│   ├── dependencies.py     # Shared dependencies (workspace, auth)
│   └── routers/            # API route handlers
│       ├── blockers_v2.py  # Blocker CRUD
│       ├── tasks_v2.py     # Task management + streaming
│       ├── prd_v2.py       # PRD management + versioning
│       ├── workspace_v2.py # Workspace init and status
│       ├── batches_v2.py   # Batch execution
│       ├── streaming_v2.py # SSE event streaming
│       ├── api_key_v2.py   # API key management
│       └── ...             # 16 router modules total
├── lib/                    # Shared utilities
│   ├── rate_limiter.py     # SlowAPI rate limiting
│   └── audit_logger.py     # Request audit logging
├── auth/                   # Authentication
│   ├── api_key_service.py  # API key creation/validation
│   └── dependencies.py     # Auth dependencies
├── config/
│   └── rate_limits.py      # Rate limit configuration
└── server/                 # Legacy server code (reference only)

web-ui/                     # Frontend (legacy, reference only)
tests/
├── core/                   # Core module tests
│   ├── test_agent.py
│   ├── test_executor.py
│   ├── test_planner.py
│   ├── test_context.py
│   ├── test_conductor.py
│   ├── test_dependency_graph.py
│   ├── test_dependency_analyzer.py
│   ├── test_task_dependencies.py
│   └── ...
└── adapters/
    └── test_llm.py

Architecture Rules (non-negotiable)

1) Core must be headless

codeframe/core/** must NOT import:

FastAPI
WebSocket frameworks
HTTP request/response objects
UI modules

Core is allowed to:

read/write durable state (SQLite/filesystem)
run orchestration/worker loops
emit events to an append-only event log
call adapters via interfaces (LLM, git, fs)

2) CLI must not require a server

Golden Path commands must work from the CLI with no server running.

FastAPI is optional and must be started explicitly (e.g., codeframe serve) and must wrap core.

3) Agent state transitions flow through runtime

Critical pattern discovered during implementation:

Agent (agent.py) manages its own AgentState (IDLE, PLANNING, EXECUTING, BLOCKED, COMPLETED, FAILED)
Runtime (runtime.py) handles all TaskStatus transitions (BACKLOG, READY, IN_PROGRESS, DONE, BLOCKED)
Agent does NOT call tasks.update_status() - runtime does this based on agent state

This separation prevents duplicate state transitions (e.g., DONE→DONE, BLOCKED→BLOCKED errors).

4) Legacy can be read, not depended on

Legacy code is reference material.

Copy/simplify logic into core when useful
Do NOT import legacy UI/server modules into core
Do NOT "fix the UI" during Golden Path work

5) Keep commits runnable

At all times:

codeframe --help works
Golden Path command stubs can run
Avoid breaking the repo with large renames/moves

Agent System Architecture

Components

Component	File	Purpose
ReactAgent	`core/react_agent.py`	Default engine: observe-think-act loop with tool use
Tools	`core/tools.py`	7 agent tools: read/edit/create file, run command/tests, search, list
Editor	`core/editor.py`	Search-replace editor with 4-level fuzzy matching
Stall Detector	`core/stall_detector.py`	Synchronous stall check + StallAction enum + StallDetectedError
Stall Monitor	`core/stall_monitor.py`	Thread-based watchdog with callback (integrated into ReactAgent)
LLM Adapter	`adapters/llm/base.py`	Protocol, ModelSelector, Purpose enum
Anthropic Provider	`adapters/llm/anthropic.py`	Claude integration with streaming
Mock Provider	`adapters/llm/mock.py`	Testing with call tracking
Context Loader	`core/context.py`	Codebase scanning, relevance scoring
Planner	`core/planner.py`	Task → ImplementationPlan via LLM (plan engine)
Executor	`core/executor.py`	File ops, shell commands, rollback (plan engine)
Agent (legacy)	`core/agent.py`	Plan-based orchestration (--engine plan)
Runtime	`core/runtime.py`	Run lifecycle, engine selection, agent invocation
Conductor	`core/conductor.py`	Batch orchestration, worker pool
Dependency Graph	`core/dependency_graph.py`	DAG operations, topological sort
Dependency Analyzer	`core/dependency_analyzer.py`	LLM-based dependency inference
Environment Validator	`core/environment.py`	Tool detection and validation
Installer	`core/installer.py`	Automatic tool installation
Diagnostics	`core/diagnostics.py`	Failed task analysis
Diagnostic Agent	`core/diagnostic_agent.py`	AI-powered task diagnosis
Credentials	`core/credentials.py`	API key and credential management
Event Publisher	`core/streaming.py`	Real-time SSE event distribution
API Key Service	`auth/api_key_service.py`	API key CRUD and validation
Rate Limiter	`lib/rate_limiter.py`	Per-endpoint rate limiting

Model Selection Strategy

Task-based heuristic via Purpose enum:

PLANNING → claude-sonnet-4-20250514 (complex reasoning)
EXECUTION → claude-sonnet-4-20250514 (balanced)
GENERATION → claude-haiku-4-20250514 (fast/cheap)

Future: cf tasks set provider <id> <provider> for per-task override.

Engine Selection

CodeFRAME supports two execution engines, selected via --engine:

Engine	Flag	Pattern	Best For
ReAct (default)	`--engine react`	Observe → Think → Act loop	Most tasks, adaptive execution
Plan (legacy)	`--engine plan`	Plan all steps → Execute sequentially	Well-defined, predictable tasks

Execution Flow (ReAct — default)

cf work start <id> --execute [--verbose]
    │
    ├── runtime.start_task_run()      # Creates run, transitions task→IN_PROGRESS
    │
    └── runtime.execute_agent(engine="react")
            │
            └── ReactAgent.run(task_id)
                ├── Load context (PRD, codebase, blockers, AGENTS.md, tech_stack)
                ├── Build layered system prompt
                │
                └── Tool-use loop (until complete/blocked/failed):
                    ├── Check stall detector (configurable: retry/blocker/fail)
                    ├── LLM decides next action (tool call)
                    ├── Execute tool: read_file, edit_file, create_file,
                    │   run_command, run_tests, search_codebase, list_files
                    ├── Observe result → feed back to LLM
                    ├── Record activity (resets stall timer)
                    ├── Incremental verification (ruff after file changes)
                    └── Token budget management (3-tier compaction)
                │
                └── Final verification with self-correction (up to 5 retries)
                │
                └── Update run/task status based on agent result
                    ├── COMPLETED → complete_run() → task→DONE
                    ├── BLOCKED → block_run() → task→BLOCKED
                    └── FAILED → fail_run() → task→FAILED

Execution Flow (Plan — legacy, `--engine plan`)

cf work start <id> --execute --engine plan
    │
    ├── runtime.start_task_run()
    │
    └── runtime.execute_agent(engine="plan")
            │
            ├── agent.run(task_id)
            │   ├── Load context (PRD, codebase, blockers, AGENTS.md)
            │   ├── Create plan via LLM
            │   ├── Execute steps (file create/edit, shell commands)
            │   ├── Run incremental verification (ruff)
            │   ├── Detect blockers (consecutive failures, missing files)
            │   └── Run final verification with SELF-CORRECTION LOOP:
            │       ├── Run all gates (pytest, ruff)
            │       ├── If failed: _attempt_verification_fix()
            │       │   ├── Try ruff --fix for quick lint fixes
            │       │   ├── Use LLM to generate fix plan from errors
            │       │   └── Execute fix steps
            │       └── Retry up to max_attempts (default: 3)
            │
            └── Update run/task status based on agent result
                ├── COMPLETED → complete_run() → task→DONE
                ├── BLOCKED → block_run() → task→BLOCKED
                └── FAILED → fail_run() → task→FAILED

Commands (v2 CLI)

Python (preferred)

Use uv for Python tasks:

uv run pytest
uv run pytest tests/core/  # Core module tests only
uv run ruff check .

CLI (Golden Path)

# Workspace
cf init <repo>                                    # Initialize workspace
cf init <repo> --detect                           # Initialize + auto-detect tech stack
cf init <repo> --tech-stack "Python with uv"      # Initialize + explicit tech stack
cf init <repo> --tech-stack-interactive           # Initialize + interactive setup
cf status

# PRD
cf prd add <file.md>
cf prd show

# Tasks
cf tasks generate          # Uses LLM to generate from PRD
cf tasks list
cf tasks list --status READY
cf tasks show <id>

# Work execution (single task)
cf work start <task-id>                    # Creates run record
cf work start <task-id> --execute          # Runs AI agent (ReAct engine, default)
cf work start <task-id> --execute --engine plan  # Use legacy plan engine
cf work start <task-id> --execute --verbose  # With detailed output
cf work start <task-id> --execute --dry-run  # Preview changes
cf work start <task-id> --execute --stall-timeout 120  # Custom stall timeout (0=disabled)
cf work start <task-id> --execute --stall-action retry  # Recovery: blocker|retry|fail
cf work stop <task-id>                     # Cancel stale run
cf work resume <task-id>                   # Resume blocked work
cf work follow <task-id>                   # Stream real-time output
cf work follow <task-id> --tail 50         # Show last 50 lines then stream

# Batch execution (multiple tasks)
cf work batch run <id1> <id2> ...          # Execute multiple tasks (ReAct default)
cf work batch run --all-ready              # All READY tasks
cf work batch run --all-ready --engine plan  # Use legacy plan engine
cf work batch run --strategy serial        # Serial (default)
cf work batch run --strategy parallel      # Parallel execution
cf work batch run --strategy auto          # LLM-inferred dependencies
cf work batch run --max-parallel 4         # Concurrent limit
cf work batch run --retry 3               # Auto-retry failures
cf work batch status [batch_id]            # Show batch status
cf work batch cancel <batch_id>            # Cancel running batch
cf work batch resume <batch_id>            # Re-run failed tasks

# Blockers
cf blocker list
cf blocker show <id>
cf blocker answer <id> "answer"

# Quality
cf review
cf patch export
cf commit

# State
cf checkpoint create "name"
cf checkpoint list
cf checkpoint restore <id>
cf summary

# Environment validation
cf env check                     # Validate tools and dependencies
cf env install                   # Install missing tools
cf env doctor                    # Comprehensive environment health check

# GitHub PR workflow
cf pr create                     # Create PR from current branch
cf pr status                     # Show PR status
cf pr checks                     # Show CI check results
cf pr merge                      # Merge approved PR

# Diagnostics
cf work diagnose <task-id>       # AI-powered analysis of failed tasks

Note: codeframe serve exists but Golden Path does not depend on it.

Frontend (legacy)

cd web-ui && npm test
cd web-ui && npm run build

Do not expand frontend scope during Golden Path work.

Documentation Navigation

Authoritative (v2)

docs/GOLDEN_PATH.md - CLI-first workflow contract
docs/REFACTOR_PLAN_FOR_AGENT.md - Step-by-step refactor instructions
docs/CLI_WIREFRAME.md - Command → module mapping
docs/AGENT_IMPLEMENTATION_TASKS.md - Agent system components
docs/V2_STRATEGIC_ROADMAP.md - 5-phase plan from CLI to multi-agent

Agent Architecture (Phase 2.5)

docs/AGENT_V3_UNIFIED_PLAN.md - ReAct architecture design and rules
docs/REACT_AGENT_ARCHITECTURE.md - Deep-dive: tools, editor, token management
docs/REACT_AGENT_ANALYSIS.md - Golden path test run analysis

API Documentation (Phase 2)

/docs - Swagger UI (interactive API explorer)
/redoc - ReDoc (readable API documentation)
/openapi.json - OpenAPI 3.1 specification
docs/PHASE_2_DEVELOPER_GUIDE.md - Server layer implementation guide
docs/PHASE_2_CLI_API_MAPPING.md - CLI to API endpoint mapping

Legacy (v1 reference only)

These describe old server/UI-driven architecture:

SPRINTS.md, sprints/
specs/
CODEFRAME_SPEC.md
v1 feature docs (context/session/auth/UI state management)

What NOT to do (common agent failure modes)

Don't add new HTTP endpoints to support the CLI
Don't require codeframe serve for CLI workflows
Don't implement UI concepts (tabs, panels, progress bars) inside core
Don't redesign auth, websockets, or UI state management
Don't add multi-providers/model switching features before Golden Path works
Don't "clean up the repo" as a goal - only refactor to enable Golden Path
Don't update task status from agent.py - let runtime handle transitions

Testing / Demoing CodeFRAME on Sample Projects

When running uv run cf commands against a sample project (e.g., cf-test/) to test or demo CodeFRAME's capabilities, you are observing the CodeFRAME agent's work, not doing the work yourself.

Rules for testing/demo mode:

You are evaluating how well the CodeFRAME agent (ReAct or Plan engine) builds the project
Do NOT help out, fix errors, or write code on behalf of the CodeFRAME agent
Do NOT intervene when the agent makes mistakes — that's data
Your job is to report the process: what worked, what failed, how close the agent got
Document the agent's output, errors encountered, and final state
Assess completion against the PRD/acceptance criteria objectively
If the agent gets stuck or fails, report that as a finding — don't rescue it

This applies when using commands like cf work start <id> --execute, cf work batch run, or any command that triggers the AI agent to do implementation work on a target project.

Practical Working Mode for Agents

When implementing anything, do this loop:

Read docs/GOLDEN_PATH.md and confirm the change is required
Find the command in docs/CLI_WIREFRAME.md
Implement core functionality in codeframe/core/
Call it from Typer command in codeframe/cli/
Emit events + persist state
Keep it runnable. Commit.

If you are unsure which direction to take, default to:

simpler state
fewer dependencies
smaller surface area
core-first, CLI-first

Recent Updates (2026-03-09)

Stall Detection System (#399, #400, #401)

Complete stall detection and configurable recovery for agent execution:

Components:

StallMonitor (core/stall_monitor.py) — Thread-based watchdog polling every 5s
StallDetector (core/stall_detector.py) — Synchronous time-tracking primitive
StallAction enum — Recovery strategy: RETRY, BLOCKER, FAIL
StallDetectedError — Exception for RETRY path (propagates to runtime for retry)

CLI flags:

--stall-timeout N — Seconds without tool activity before stall (default: 300, 0=disabled)
--stall-action {blocker,retry,fail} — Recovery action (default: blocker)
Both flags available on cf work start and cf work batch run

Recovery flow:

BLOCKER (default): Creates informative blocker, task → BLOCKED
RETRY: Raises StallDetectedError, runtime retries once with fresh agent
FAIL: Task transitions directly to FAILED

Config: agent_budget.stall_timeout_s in .codeframe/config.yaml (0 = disabled)

Phase 2.5 Complete: ReAct Agent Architecture (#355)

Default execution engine switched from plan-based to ReAct (Reasoning + Acting).

What changed:

Default engine is now "react" — all cf work start --execute and cf work batch run commands use ReactAgent
Legacy plan engine available via --engine plan flag
ReactAgent uses iterative tool-use loop (observe → think → act) instead of plan-all-then-execute
7 structured tools: read_file, edit_file, create_file, run_command, run_tests, search_codebase, list_files
Search-replace editing with 4-level fuzzy matching (exact → whitespace-normalized → indentation-agnostic → fuzzy)
Token budget management with 3-tier compaction
Adaptive iteration budget based on task complexity

Phase 2.5 deliverables:

✅ ReAct agent implementation (core/react_agent.py, core/tools.py, core/editor.py)
✅ CLI --engine flag ([Phase 2.5-F] End-to-end CLI validation with cf-test project #353)
✅ API engine parameter ([Phase 2.5-F] Verify ReAct engine works via API routes #354)
✅ Default switch to react + documentation ([Phase 2.5-F] Switch default engine to react and update documentation #355)

Phase	Focus	Pipeline Stage	Status
1	CLI Completion	Think + Build	✅ Complete
2	Server Layer	Build (API)	✅ Complete
2.5	ReAct Agent	Build (execution)	✅ Complete
3	Web UI Rebuild	All (dashboard)	In Progress
4	Agent Adapters + Orchestration	Build (delegate to frontier agents)	Next
5	PROOF9 + Advanced	Prove + Ship (quality memory)	Planned

Phase 2 Complete: Server Layer (2026-02-03)

Phase 2 deliverables completed:

✅ Server audit and refactor ([Phase 2] Server audit and refactor - routes delegating to core modules #322) - 16 v2 routers following thin adapter pattern
✅ API key authentication (feat(auth): add API key authentication for CLI and REST API #326) - Scopes: read/write/admin
✅ Rate limiting (feat(security): add API rate limiting with slowapi #327) - Configurable per-endpoint with Redis support
✅ Real-time SSE streaming (feat(streaming): add real-time SSE events for task execution #328) - /api/v2/tasks/{id}/stream
✅ OpenAPI documentation ([Phase 2] Complete OpenAPI documentation for all endpoints #119) - Full Swagger/ReDoc with examples

Server Architecture (Phase 2)

Pattern: Thin adapter over core - server routes delegate to core.* modules.

CLI (typer) ─┬── core.* ─── adapters.*
             │
Server (fastapi) ─┘

V2 Router Modules (16 total):

Router	Endpoints	Purpose
`blockers_v2`	5	Blocker CRUD
`prd_v2`	8	PRD management + versioning
`tasks_v2`	12	Task management + streaming
`workspace_v2`	5	Init, status, tech stack
`batches_v2`	5	Batch execution strategies
`streaming_v2`	2	SSE event streaming
`api_key_v2`	4	API key management
`discovery_v2`	5	PRD discovery sessions
`checkpoints_v2`	6	State checkpoints
`schedule_v2`	3	Task scheduling
`templates_v2`	4	PRD templates
`git_v2`	3	Git operations
`review_v2`	2	Code review
`pr_v2`	5	GitHub PR workflow
`environment_v2`	4	Tool detection
`proof_v2`	7	PROOF9 quality gates + requirements

API Authentication:

# Create API key
cf auth api-key-create --name "my-key" --scopes read,write

# Use in requests
curl -H "X-API-Key: cf_..." https://api.example.com/api/v2/tasks

Rate Limiting:

Default: 100 requests/minute (standard endpoints)
Auth endpoints: 10/minute
AI endpoints: 20/minute
Configurable via RATE_LIMIT_* environment variables

OpenAPI Documentation:

Swagger UI: /docs
ReDoc: /redoc
OpenAPI JSON: /openapi.json

Previous Updates (2026-01-29)

V2 Strategic Roadmap Established

Created comprehensive 5-phase roadmap in docs/V2_STRATEGIC_ROADMAP.md.

Phase 1 Complete: CLI Foundation

All Phase 1 priorities completed:

✅ cf prd generate - Socratic PRD discovery ([Phase 1] cf prd generate - Interactive AI PRD creation (Socratic Discovery) #307)
✅ cf work follow - Live execution streaming ([Phase 1] cf work follow - Live execution streaming #308)
✅ Integration tests for credential/env modules ([Phase 1] Integration tests for credential and environment modules #309)
✅ PRD template system ([Phase 1] PRD template system for customizable output formats #316)

Environment Validation (`cf env`)

New commands for validating development environment:

cf env check              # Validate required tools (git, uv, ruff, pytest)
cf env install            # Install missing tools automatically
cf env doctor             # Comprehensive environment health check

Modules:

core/environment.py - Tool detection and validation
core/installer.py - Cross-platform tool installation

GitHub PR Workflow (`cf pr`)

Streamlined PR management without leaving the CLI:

cf pr create              # Create PR from current branch
cf pr status              # Show PR status and review state
cf pr checks              # Show CI check results
cf pr merge               # Merge approved PR

Task Self-Diagnosis (`cf work diagnose`)

AI-powered analysis of failed tasks:

cf work diagnose <task-id>   # Analyze why a task failed

Modules:

core/diagnostics.py - Failed task analysis
core/diagnostic_agent.py - AI-powered diagnosis

Bug Fixes

[Phase 1] Backend: NoneType error accessing search_pattern during task execution #265: Fixed NoneType error in codebase_index.search_pattern() - added null check
[Phase 1] Checkpoint diff API returns 500 - workspace directory missing #253: Fixed checkpoint diff API returning 500 - added workspace existence validation

GitHub Issue Organization

Created v1-legacy label for 22 v1-specific issues (closed, retained as Phase 3 reference)
Created phase labels: phase-1, phase-2, phase-4, phase-5
Created 9 new issues ([Phase 1] cf prd generate - Interactive AI PRD creation (Socratic Discovery) #307-[Phase 5] Debug and replay mode #315) for roadmap features
Consistent naming: [Phase #] Title format

Previous Updates (2026-01-16)

Phase 3.1: Tech Stack Configuration

Simplified tech stack configuration using natural language descriptions:

✅ tech_stack field on Workspace model - stores natural language description
✅ --detect flag - auto-detects from pyproject.toml, package.json, Cargo.toml, go.mod
✅ --tech-stack flag - explicit tech stack description (e.g., "Rust project with cargo")
✅ --tech-stack-interactive flag - simple prompt for user input (stub for future multi-round)
✅ Agent integration - TaskContext and Planner include tech_stack in LLM prompts
✅ Removed cf config subcommand - tech stack is now part of workspace init

Design philosophy: Instead of structured configuration with specific package managers and frameworks, users describe their stack in natural language. The agent interprets and adapts.

Examples:

cf init . --detect                           # Auto-detect: "Python with uv, pytest, ruff for linting"
cf init . --tech-stack "Rust project using cargo"
cf init . --tech-stack "TypeScript monorepo with pnpm, Next.js, jest"
cf init . --tech-stack-interactive           # Prompts user for description

Future work: Multi-round interactive discovery (bead: codeframe-8d80)

Agent Self-Correction & Observability

Improved agent reliability with automatic error recovery:

✅ Self-correction loop in _run_final_verification() - agent retries up to 3 times
✅ Verbose mode (--verbose / -v) - shows detailed verification/self-correction progress
✅ FAILED task status - tasks transition to FAILED for proper error visibility
✅ Project preferences - agent loads AGENTS.md/CLAUDE.md for per-project config
✅ Fixed fail_run() - now properly transitions task status (was leaving tasks stuck)

Enhanced Self-Correction (Phase 3.4)

Advanced error recovery with loop prevention and smart escalation:

✅ Fix Attempt Tracker (core/fix_tracker.py) - prevents repeating failed fixes
- Normalizes errors for comparison (removes line numbers, memory addresses)
- Tracks (error_signature, fix_description) pairs with outcomes
- Detects escalation patterns (same error 3+ times, same file 3+ times)
✅ Pattern-Based Quick Fixes (core/quick_fixes.py) - fixes common errors without LLM
- ModuleNotFoundError → auto-install package (detects package manager)
- ImportError → add missing import statement
- NameError → add common imports (Optional, dataclass, Path, etc.)
- SyntaxError → fix missing colons, f-string prefixes
- IndentationError → normalize mixed tabs/spaces
✅ Escalation to Blocker - creates informative blockers when stuck
- Triggered after MAX_SAME_ERROR_ATTEMPTS (3) failures on same error
- Triggered after MAX_SAME_FILE_ATTEMPTS (3) failures on same file
- Triggered after MAX_TOTAL_FAILURES (5) in a run
- Blocker includes error type, attempted fixes, and guidance questions

Self-Correction Flow

Error occurs
    │
    ├── Try ruff --fix (auto-lint)
    │
    ├── Try pattern-based quick fix (no LLM)
    │   ├── Check if fix already attempted → skip
    │   ├── Apply fix
    │   └── Record outcome in tracker
    │
    ├── Check escalation threshold
    │   └── If exceeded → create escalation blocker
    │
    └── Use LLM to generate fix plan
        ├── Include already-tried fixes to avoid repetition
        ├── Execute fix steps with tracking
        └── Re-verify

Key Self-Correction Methods

_run_final_verification(): While loop that re-runs gates after self-correction
_attempt_verification_fix(): Orchestrates quick fixes, escalation check, LLM fixes
_create_escalation_blocker(): Creates detailed blocker with context
_verbose_print(): Conditional stdout output for observability

Phase 2 Complete (2026-01-15): Parallel Batch Execution

All 6 Phase 2 items from CLI_WIREFRAME.md are done:

✅ work batch resume <batch-id> - re-run failed/blocked tasks
✅ depends_on field on Task model
✅ Dependency graph analysis (DAG, cycle detection, topological sort)
✅ True parallel execution with ThreadPoolExecutor worker pool
✅ --strategy auto with LLM-based dependency inference
✅ --retry N automatic retry of failed tasks

Key Phase 2 Modules

conductor.py: Batch orchestration with serial/parallel/auto strategies
dependency_graph.py: DAG operations, level-based grouping for parallelization
dependency_analyzer.py: LLM analyzes task descriptions to infer dependencies

Agent Implementation Complete (2026-01-14)

All 8 implementation tasks from AGENT_IMPLEMENTATION_TASKS.md are done:

✅ LLM Adapter Interface (adapters/llm/)
✅ Task Context Loader (core/context.py)
✅ Agent Planning (core/planner.py)
✅ Code Execution Engine (core/executor.py)
✅ Automatic Blocker Detection (in core/agent.py)
✅ Gate Integration (in core/agent.py)
✅ Agent Orchestrator (core/agent.py)
✅ Wire into Runtime (core/runtime.py)

Bug Fixes During Testing

GateResult attribute access: Fixed gate_result.status → gate_result.passed
Duplicate task transitions: Removed task status updates from agent.py (runtime handles all)
READY→READY error: Added check in stop_run before transitioning
Verification step handling: Made _execute_verification smarter about file vs command targets

Key Design Decisions

State separation: Agent manages AgentState, Runtime manages TaskStatus
Model selection: Task-based heuristic via Purpose enum
Blocker creation: Agent creates blockers, Runtime updates task status
Verification: Incremental (ruff after each file change) + final (all gates)

Testing

Run all tests

uv run pytest

Run v2 tests only

uv run pytest -m v2           # All v2 tests (~411 tests)
uv run pytest -m v2 -q        # Quiet mode

The v2 marker identifies tests for CLI-first, headless functionality:

All tests in tests/core/ are automatically marked v2 (via conftest.py)
v2 CLI tests have pytestmark = pytest.mark.v2 at the top

Convention: When adding new v2 functionality, mark tests with @pytest.mark.v2 or add pytestmark = pytest.mark.v2 at module level for CLI tests that use codeframe.cli.app.

Run core module tests

uv run pytest tests/core/
uv run pytest tests/core/test_agent.py -v
uv run pytest tests/adapters/test_llm.py -v

Test coverage

uv run pytest --cov=codeframe --cov-report=html

Environment Variables

# Required for agent execution
ANTHROPIC_API_KEY=sk-ant-...

# Optional - Database
DATABASE_PATH=./codeframe.db

# Optional - Rate Limiting (Phase 2)
RATE_LIMIT_ENABLED=true                    # Enable/disable rate limiting
RATE_LIMIT_DEFAULT=100/minute              # Default limit
RATE_LIMIT_AUTH=10/minute                  # Auth endpoints
RATE_LIMIT_AI=20/minute                    # AI/LLM endpoints
RATE_LIMIT_WEBSOCKET=50/minute             # WebSocket connections
REDIS_URL=redis://localhost:6379           # Redis for distributed rate limiting (optional)

# Optional - API Server
CODEFRAME_API_KEY_SECRET=<random-secret>   # Secret for API key hashing

Legacy sections removed on purpose

This file previously contained extensive v1 details (auth, websocket, UI template, sprint history).
Those are still in git history and legacy docs, but they are not the current contract.

The current contract is Golden Path + Refactor Plan + Command Tree mapping + Agent Implementation.

claude · 2026-03-24T18:39:44Z

Consolidated Code Review for PR 497 (smart-view event grouping in EventStream.tsx)

CRITICAL (3 issues):

Type guards: isReadEvent and isEditEvent cast through structural types with optional message rather than narrowing to ProgressEvent (which already has non-optional message in useTaskStream.ts). Fix: return e is ProgressEvent type predicates to eliminate downstream casts in groupEvents.
Unguarded computation: groupEvents runs unconditionally on every render even when showAll is true and groups is never consumed. Fix: wrap in useMemo with displayEvents as the dependency.
Fragile layout: h-[calc(100%-37px)] hardcodes the header pixel height. Fix: use flex-1 on the scroll container.

IMPORTANT (4 issues):
4. Dropped callback: EventItem inside ReadGroupRow omits onBlockerAnswered. A blocker event inside a read group silently loses the answer handler. Thread the prop through.
5. Unstable keys: rg-i keys use array index so React unmounts ReadGroupRow on every streaming update. Fix: use the first event timestamp as the key.
6. Inconsistent flush: flushEdit degrades single-item buffers to plain EventItem; flushRead always emits read_group. A lone read renders as a collapsed group. Make them consistent.
7. Unbounded file list in edit groups: ReadGroupRow truncates at 4 files, EditGroupRow does not. Apply the same limit.

MINOR: extractFilename fallback exposes raw message strings in file lists; formatTime is duplicated from EventItem.tsx; groupEvents has no unit tests despite being a testable pure function; smart/raw toggle missing aria-pressed attribute.

CONTEXT: web-ui/ is legacy/reference-only per CLAUDE.md and will be replaced in Phase 3. No core architecture violations in this PR.

MERGE BLOCKERS: issues 2 (memoize groupEvents), 4 (onBlockerAnswered), and 5 (stable keys).

…roupBadgeStyles

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@web-ui/src/lib/eventStyles.ts`:
- Around line 85-87: The new export editGroupBadgeStyles uses blue color classes
which violates the repo's gray-only UI requirement; update the
editGroupBadgeStyles constant to use the Shadcn/UI Nova preset gray badge
classes (replace blue-* and dark:blue-* classes with the equivalent gray-* and
dark:gray-* classes) so it conforms to web-ui/src/**/*.{ts,tsx} gray color
scheme and Hugeicons conventions; locate the editGroupBadgeStyles export to make
this change.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: dfcd913e-c5a3-4385-abc0-1d4ac6a4a039

📥 Commits

Reviewing files that changed from the base of the PR and between a81ab87 and b7906fb.

📒 Files selected for processing (2)

web-ui/src/components/execution/EventStream.tsx
web-ui/src/lib/eventStyles.ts

🚧 Files skipped from review as they are similar to previous changes (1)

web-ui/src/components/execution/EventStream.tsx

coderabbitai · 2026-03-24T18:43:08Z

web-ui/src/lib/eventStyles.ts

+/** Badge style for the edit-group summary row in the EventStream. */
+export const editGroupBadgeStyles = 'bg-blue-100 text-blue-800 dark:bg-blue-900/30 dark:text-blue-300';
+


⚠️ Potential issue | 🟡 Minor

Use gray badge classes for the new edit-group style constant.

Line 86 introduces blue badge colors, which conflicts with the repo’s gray-only UI color scheme requirement for web-ui/src/**/*.{ts,tsx}.

🎨 Suggested change

-export const editGroupBadgeStyles = 'bg-blue-100 text-blue-800 dark:bg-blue-900/30 dark:text-blue-300'; +export const editGroupBadgeStyles = 'bg-gray-100 text-gray-800 dark:bg-gray-800 dark:text-gray-300';

As per coding guidelines, web-ui/src/**/*.{ts,tsx}: Web UI must use Shadcn/UI (Nova preset) with gray color scheme and Hugeicons (@hugeicons/react); never use lucide-react.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

/** Badge style for the edit-group summary row in the EventStream. */

export const editGroupBadgeStyles = 'bg-blue-100 text-blue-800 dark:bg-blue-900/30 dark:text-blue-300';

/** Badge style for the edit-group summary row in the EventStream. */

export const editGroupBadgeStyles = 'bg-gray-100 text-gray-800 dark:bg-gray-800 dark:text-gray-300';

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@web-ui/src/lib/eventStyles.ts` around lines 85 - 87, The new export editGroupBadgeStyles uses blue color classes which violates the repo's gray-only UI requirement; update the editGroupBadgeStyles constant to use the Shadcn/UI Nova preset gray badge classes (replace blue-* and dark:blue-* classes with the equivalent gray-* and dark:gray-* classes) so it conforms to web-ui/src/**/*.{ts,tsx} gray color scheme and Hugeicons conventions; locate the editGroupBadgeStyles export to make this change.

claude · 2026-03-24T18:43:58Z

Follow-up review test

claude · 2026-03-24T18:44:49Z

test line 1

claude · 2026-03-24T18:45:06Z

Part 1 of review: Two items resolved in b7906fb. groupEvents wrapped in useMemo (issue 2). editGroupBadgeStyles extracted to eventStyles.ts.

claude · 2026-03-24T18:45:16Z

--- ## Follow-up review

claude · 2026-03-24T18:45:27Z

Follow-up review for PR 497

Two items from the previous review have been addressed in the latest commit (b7906fb):

Resolved:

groupEvents is now wrapped in useMemo - eliminates the redundant recompute on every render (was issue 2)
editGroupBadgeStyles extracted to eventStyles.ts - matches the dark-mode opacity used by agentStateBadgeStyles

Still outstanding from prior review - in priority order:

Blocking / functional:

onBlockerAnswered silently dropped (issue 4) - ReadGroupRow passes no onBlockerAnswered prop to its EventItem children. If a blocker event arrives inside a read group the callback is never called. Fix: thread the prop through ReadGroupRow.
Unstable keys reset expand state (issue 5) - Keys rg-N and eg-N are index-based. During live streaming the group list shifts as new events arrive, causing React to unmount and remount ReadGroupRow which collapses any row the user just opened. Use group.timestamp (or the first event timestamp for read_group) as the key instead.

Quality / consistency:

Type assertions still present (issue 1) - the structural cast pattern appears in isReadEvent, isEditEvent, and groupEvents. ExecutionEvent is a union - narrow to the ProgressEvent member with a proper type guard instead of the structural cast.
Hardcoded 37px header height (issue 3) - h-[calc(100%-37px)] will break if padding or font size changes. Replace with a flex column layout (flex flex-col on the outer wrapper, shrink-0 on the header, flex-1 min-h-0 on the scroll container).
flushRead / flushEdit inconsistency (issue 6) - flushEdit degrades to a plain EventItem for a single-item buffer; flushRead always emits a read_group regardless. Apply the same single-event pass-through to flushRead for consistency.
EditGroupRow file list not truncated (issue 7) - ReadGroupRow clips the label at 4 files; EditGroupRow renders all of them. Cap EditGroupRow at the same limit (e.g. show first 3 and +N more).

Minor:

aria-pressed missing on view toggle (issue 11) - The smart/raw toggle button should have aria-pressed equal to the showAll state so screen readers can communicate its state.
formatTime duplication (issue 9) - The inline toLocaleTimeString pattern is repeated in both ReadGroupRow and EditGroupRow. The same formatting already exists in EventItem.tsx - extract a shared utility.

Items 1 and 2 (onBlockerAnswered and stable keys) are the highest-priority blockers before merge as they affect correctness in live-streaming scenarios.

claude · 2026-03-24T18:47:09Z

Follow-up review — PR #497

Two items from the previous review have been addressed in the latest commit (b7906fb):

Resolved

groupEvents is now wrapped in useMemo — eliminates the redundant recompute on every render (was issue Brainstorming: Integrate remaining general concepts into specification #2)
editGroupBadgeStyles extracted to eventStyles.ts — matches the dark-mode opacity used by agentStateBadgeStyles

Still outstanding from prior review

The following items flagged before were not addressed. Listing them in priority order:

Blocking / functional

onBlockerAnswered silently dropped (issue feat: Project schema refactoring with source types and workspace management #4) — ReadGroupRow passes no onBlockerAnswered prop to its EventItem children. If a blocker event arrives inside a read group the callback is never called. Fix: thread the prop through ReadGroupRow.
Unstable keys reset expand state (issue feat: Project schema refactoring with API endpoint integration #5) — Keys rg-N and eg-N are index-based. During live streaming the group list shifts as new events arrive, causing React to unmount and remount ReadGroupRow which collapses any row the user just opened. Use group.timestamp (or the first event timestamp for read_group) as the key instead.

Quality / consistency

Type assertions still present (issue Add Claude Code GitHub Workflow #1) — the structural cast pattern appears in isReadEvent, isEditEvent, and groupEvents. ExecutionEvent is a union — narrow to the ProgressEvent member with a proper type guard instead of the structural cast.
Hardcoded 37px header height (issue feat(sprint-4): Multi-Agent Coordination System - P0 & P1 Complete #3) — h-[calc(100%-37px)] will break if padding or font size changes. Replace with a flex column layout (flex flex-col on the outer wrapper, shrink-0 on the header, flex-1 min-h-0 on the scroll container).
flushRead / flushEdit inconsistency (issue Project Schema Refactoring - Flexible Source Types & Deployment Modes #6) — flushEdit degrades to a plain EventItem for a single-item buffer; flushRead always emits a read_group regardless. Apply the same single-event pass-through to flushRead for consistency.
EditGroupRow file list not truncated (issue Improve workspace cleanup in project creation rollback #7) — ReadGroupRow clips the label at 4 files; EditGroupRow renders all of them. Cap EditGroupRow at the same limit (e.g. show first 3 and +N more).

Minor

aria-pressed missing on view toggle (issue feat: Convert worker agents to async/await pattern (cf-48) #11) — The smart/raw toggle button should have aria-pressed equal to the showAll state so screen readers can communicate its state.
formatTime duplication (issue Merge Sprint 4 Multi-Agent Coordination work into main #9) — The inline toLocaleTimeString pattern is repeated in both ReadGroupRow and EditGroupRow. The same formatting already exists in EventItem.tsx — extract a shared utility.

Items #1 and #2 (onBlockerAnswered and stable keys) are the highest-priority blockers before merge as they affect correctness in live-streaming scenarios.

coderabbitai bot reviewed Mar 24, 2026

View reviewed changes

web-ui/src/components/execution/EventStream.tsx Show resolved Hide resolved

web-ui/src/components/execution/EventStream.tsx Outdated Show resolved Hide resolved

fix: address CodeRabbit feedback on PR #497 — useMemo + extract editG…

b7906fb

…roupBadgeStyles

coderabbitai bot reviewed Mar 24, 2026

View reviewed changes

frankbria merged commit e1296e7 into main Mar 24, 2026
13 checks passed

frankbria deleted the feat/issue-475-compress-event-stream branch March 24, 2026 23:28

		/** Badge style for the edit-group summary row in the EventStream. */
		export const editGroupBadgeStyles = 'bg-blue-100 text-blue-800 dark:bg-blue-900/30 dark:text-blue-300';

Conversation

frankbria commented Mar 24, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (2 warnings)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

claude bot commented Mar 24, 2026

Uh oh!

claude bot commented Mar 24, 2026

Uh oh!

claude bot commented Mar 24, 2026

Uh oh!

claude bot commented Mar 24, 2026

Uh oh!

claude bot commented Mar 24, 2026

Uh oh!

claude bot commented Mar 24, 2026

CodeFRAME Development Guidelines

Product Vision

Primary Contract (MUST FOLLOW)

Strategic Priority (Phase 4)

Current Reality (Phase 1, 2 & 2.5 Complete)

What's Working Now

v2 Architecture (current)

v1 Legacy

Repository Structure

Architecture Rules (non-negotiable)

1) Core must be headless

2) CLI must not require a server

3) Agent state transitions flow through runtime

4) Legacy can be read, not depended on

5) Keep commits runnable

Agent System Architecture

Components

Model Selection Strategy

Engine Selection

Execution Flow (ReAct — default)

Execution Flow (Plan — legacy, --engine plan)

Commands (v2 CLI)

Python (preferred)

CLI (Golden Path)

Frontend (legacy)

Documentation Navigation

Authoritative (v2)

Agent Architecture (Phase 2.5)

API Documentation (Phase 2)

Legacy (v1 reference only)

What NOT to do (common agent failure modes)

Testing / Demoing CodeFRAME on Sample Projects

Practical Working Mode for Agents

Recent Updates (2026-03-09)

Stall Detection System (#399, #400, #401)

Phase 2.5 Complete: ReAct Agent Architecture (#355)

Phase 2 Complete: Server Layer (2026-02-03)

Server Architecture (Phase 2)

Previous Updates (2026-01-29)

V2 Strategic Roadmap Established

Phase 1 Complete: CLI Foundation

Environment Validation (cf env)

GitHub PR Workflow (cf pr)

Task Self-Diagnosis (cf work diagnose)

Bug Fixes

GitHub Issue Organization

Previous Updates (2026-01-16)

Phase 3.1: Tech Stack Configuration

Agent Self-Correction & Observability

frankbria commented Mar 24, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 24, 2026 •

edited

Loading

Execution Flow (Plan — legacy, `--engine plan`)

Environment Validation (`cf env`)

GitHub PR Workflow (`cf pr`)

Task Self-Diagnosis (`cf work diagnose`)

claude bot commented Mar 24, 2026 •

edited

Loading

claude bot commented Mar 24, 2026 •

edited

Loading