Skip to content

Latest commit

 

History

History
317 lines (246 loc) · 20.4 KB

File metadata and controls

317 lines (246 loc) · 20.4 KB

CodeBoarding Action

One action, two modes: architecture review on every pull request, and a versioned, always-current architecture baseline on your main branch.

  • mode: review (the default) — CodeBoarding analyzes your architecture before and after a change, then comments on the PR with an inline Mermaid diagram of what changed: added, modified, and deleted components and the relationships between them. Runs on pull_request and issue_comment.
  • mode: sync — CodeBoarding keeps your architecture analysis versioned and current on your branch: on every push it commits the analysis.json baseline plus readable markdown (.codeboarding/*.md), so reviews diff against your current architecture and your architecture has real git history. Runs on push, workflow_dispatch, and schedule. See sync mode.

Both modes run the CodeBoarding engine in CI: static analysis combined with LLM reasoning. They are designed to be used together — sync mode keeps the baseline fresh that review mode diffs against — but each works on its own.

CodeBoarding · Website · Explore examples · VS Code extension · Discord

JavaScript TypeScript Java Python Go PHP Rust C#

What review mode does

  • Builds or reuses a baseline architecture analysis for the PR base.
  • Runs incremental analysis on the PR head, then diffs components and relationships.
  • Posts a sticky PR comment with an inline Mermaid map. Green is added, yellow is modified, red (dashed) is deleted, for both nodes and edges.

A PR comment looks like this:

graph LR
    Orchestration_Workflow_Manager["Orchestration & Workflow Manager"]
    Incremental_Analysis_Controller["Incremental Analysis Controller"]
    Static_Analysis_Engine["Static Analysis Engine"]
    Agentic_Intelligence_Core["Agentic Intelligence Core"]
    Health_Quality_Monitor["Health & Quality Monitor"]
    Rendering_Output_Engine["Rendering & Output Engine"]
    Persistence_Provider_Infrastructure["Persistence & Provider Infrastructure"]
    Orchestration_Workflow_Manager -- "triggers change detection" --> Incremental_Analysis_Controller
    Incremental_Analysis_Controller -- "passes filtered file sets" --> Static_Analysis_Engine
    Static_Analysis_Engine -- "provides CFGs and symbol tables" --> Agentic_Intelligence_Core
    Static_Analysis_Engine -- "supplies structural metrics" --> Health_Quality_Monitor
    Agentic_Intelligence_Core -- "delivers summaries and diagrams" --> Rendering_Output_Engine
    Health_Quality_Monitor -- "provides health reports" --> Rendering_Output_Engine
    Persistence_Provider_Infrastructure -- "supplies LLM clients" --> Agentic_Intelligence_Core
    Orchestration_Workflow_Manager -- "persists pipeline state" --> Persistence_Provider_Infrastructure
    classDef added fill:#1f883d,stroke:#0b5d23,color:#fff;
    classDef modified fill:#bf8700,stroke:#7d4e00,color:#fff;
    classDef deleted fill:#cf222e,stroke:#82071e,color:#fff,stroke-dasharray:5 3;
    class Health_Quality_Monitor added;
    class Static_Analysis_Engine,Agentic_Intelligence_Core modified;
    class Persistence_Provider_Infrastructure deleted;
    linkStyle 3,5 stroke:#1f883d,stroke-width:2px;
    linkStyle 2 stroke:#bf8700,stroke-width:2px;
    linkStyle 6,7 stroke:#cf222e,stroke-width:2px,stroke-dasharray:5 3;
Loading

Quick start: PR review (review mode)

Create .github/workflows/codeboarding.yml:

name: CodeBoarding review

on:
  pull_request:
    # Generate once, when the PR becomes reviewable, not on every push, so you
    # don't spend an LLM job per commit. Use [opened] for strictly creation-only,
    # or add `synchronize` to re-run on each push. Refresh anytime with /codeboarding.
    # 'closed' only cancels an in-flight review (see concurrency), it doesn't start one.
    types: [opened, reopened, ready_for_review, closed]
  issue_comment:
    types: [created]

permissions:
  # write lets the action commit analysis.json to the PR branch so the comment can
  # link to the webview diff. Drop to `read` to keep the comment without that link.
  contents: write
  pull-requests: write
  issues: write

concurrency:
  group: codeboarding-${{ github.event.pull_request.number || github.event.issue.number }}
  # Cancel only when the PR closes — bot comments (issue_comment) and re-triggers
  # must not cancel a running review; they queue behind it instead.
  cancel-in-progress: ${{ github.event_name == 'pull_request' && github.event.action == 'closed' }}

jobs:
  review:
    runs-on: ubuntu-latest
    timeout-minutes: 60
    if: >
      (github.event_name == 'pull_request' && github.event.action != 'closed' && github.event.pull_request.draft == false) ||
      (github.event_name == 'issue_comment' && github.event.issue.pull_request != null &&
       startsWith(github.event.comment.body, '/codeboarding') &&
       contains(fromJSON('["OWNER","MEMBER","COLLABORATOR"]'), github.event.comment.author_association))
    steps:
      - uses: CodeBoarding/CodeBoarding-action@v1
        with:
          llm_api_key: ${{ secrets.OPENROUTER_API_KEY }}

Add the API key as a repository secret (Settings → Secrets and variables → Actions):

OPENROUTER_API_KEY = sk-or-...

That is the only required setup, passed via llm_api_key above. For local runs with scripts/run_local.sh, export OPENROUTER_API_KEY as an environment variable instead.

Models are optional. Omit agent_model and parsing_model to use the engine's default for your provider, or pin them inline or from a repository variable (a model name is not a secret, so use vars., not secrets.):

        with:
          llm_api_key:   ${{ secrets.OPENROUTER_API_KEY }}  # secret
          agent_model:   anthropic/claude-sonnet-4          # optional; or ${{ vars.AGENT_MODEL }}
          parsing_model: google/gemini-3-flash-preview      # optional

Bring your own LLM provider

OpenRouter is the default, but you can use any provider the engine supports. Set llm_provider and pass that provider's key:

        with:
          llm_provider: anthropic                  # omit for OpenRouter (default)
          llm_api_key:  ${{ secrets.ANTHROPIC_API_KEY }}

llm_provider: <name> hands your key to the engine as <NAME>_API_KEY, and the engine auto-selects that provider. Set exactly one key per run.

Supported providers
llm_provider Environment variable the engine reads
openrouter (default) OPENROUTER_API_KEY
openai OPENAI_API_KEY
anthropic ANTHROPIC_API_KEY
google GOOGLE_API_KEY
vercel VERCEL_API_KEY
deepseek DEEPSEEK_API_KEY
cerebras CEREBRAS_API_KEY
glm / kimi GLM_API_KEY / KIMI_API_KEY
aws_bedrock AWS_BEARER_TOKEN_BEDROCK
ollama OLLAMA_BASE_URL

This table mirrors the engine and may lag it. The source of truth is the engine's provider registry, agents/llm_config.py. Any provider it adds that follows the <NAME>_API_KEY convention works here with no action change.

When review mode runs

  • On a PR being opened, reopened, or marked ready for review, the diagram is generated once (per the on: triggers above). It does not re-run on every push, so you never spend an LLM job per commit; the comment reflects that point until refreshed.
  • On a /codeboarding comment, a trusted collaborator (OWNER, MEMBER, or COLLABORATOR) regenerates the diagram against the current PR head, even if one already exists. Each /codeboarding invocation posts a new comment and leaves earlier comments untouched (the automatic on-open comment, and any previous /codeboarding results, stay put). Change the keyword via trigger_command.

The command needs the issue_comment trigger and runs from your default branch (a GitHub rule), so it only works once the workflow is merged there. On-demand runs on fork PRs are refused, so fork code is never analyzed with your secrets.

Feedback command

In review workflows that include issue_comment, anyone whose comment reaches the action can send product feedback with:

/codeboarding-feedback <message>

Keep your architecture versioned (sync mode)

With mode: sync, the action analyzes the pushed commit and commits the results back to the branch (as codeboarding[bot]), so your architecture analysis stays versioned in git and tracks the code instead of drifting from it:

  • .codeboarding/*.md — rendered architecture docs: overview.md plus one page per component (directory configurable via output_dir).
  • .codeboarding/analysis.json — the machine-readable analysis, which doubles as the baseline that review mode and the webview diff against (alongside codeboarding_version.json and health/health_report.json).
  • docs/development/architecture.md (optional, on by default) — all pages concatenated into a single document, overview.md first. Disable with write_architecture_md: false.

Create .github/workflows/codeboarding-sync.yml next to your review workflow:

name: CodeBoarding sync

on:
  push:
    branches: [main]
    # Loop guard: don't re-trigger on the files this workflow itself commits.
    # Listed explicitly (not '.codeboarding/**') so that editing your own
    # .codeboarding/.codeboardingignore still regenerates the docs. (The action
    # also skips re-analyzing its own bot commit as a backstop, and deliberately
    # does NOT use [skip ci] — that would leak through squash-merges.)
    paths-ignore:
      - '.codeboarding/*.md'
      - '.codeboarding/analysis.json'
      - '.codeboarding/codeboarding_version.json'
      - '.codeboarding/health/**'
      - 'docs/development/architecture.md'
  workflow_dispatch:

permissions:
  contents: write   # commit the generated docs to the branch

concurrency:
  group: codeboarding-sync
  cancel-in-progress: false

jobs:
  sync:
    runs-on: ubuntu-latest
    timeout-minutes: 60
    steps:
      - uses: CodeBoarding/CodeBoarding-action@v1
        with:
          mode: sync
          llm_api_key: ${{ secrets.OPENROUTER_API_KEY }}

Behavior worth knowing:

  • The first run on a branch is a full analysis; subsequent runs reuse the committed baseline and run incrementally when they can (the analysis_mode output tells you which happened).
  • The commit is skipped when nothing meaningful changed (an empty diff, or only generated_at/timestamp fields). The push retries a few times with fetch+rebase and fails open, so a race with another push never fails your CI.
  • Tag pushes are skipped. pull_request events soft-skip in sync mode, so a mistakenly shared workflow can never push docs from a PR run.
  • The bot commit carries no [skip ci] — on a squash-merge that marker leaks into the merge commit and would skip the very sync run (and release tooling, CI) the merge should trigger. The regen loop is instead prevented by the paths-ignore list above and by the action skipping re-analysis of its own bot commit, so a merge to main reliably triggers a fresh incremental sync.
  • output_dir is owned by the action: pre-existing top-level markdown files in it are deleted on every run (stale component pages must not linger). Don't point it at a directory with hand-written docs.

How the two modes work together

Sync mode keeps the committed .codeboarding/analysis.json baseline fresh on main. Review mode reuses that committed baseline for the PR base, so PR reviews diff against your current main architecture and run incrementally instead of rebuilding a base from scratch — faster and cheaper per PR.

Use the same depth_level in both workflows (both default to 2). Review mode regenerates its base when the committed baseline is deeper than the workflow's depth_level, so a lowered depth silently forfeits the reuse. (A baseline recording a shallower depth is accepted: the engine records the depth actually reached, which can be less than requested on repos where no component expands.) Want cheaper, faster runs? Set depth_level: 1 in both workflows.

One caveat for squash-merge repos: the analysis.json that review mode commits to PR branches carries the PR-head SHA, which a squash merge orphans — so that copy can't validate as a baseline on main. Sync mode running on main is what keeps the baseline valid there.

Security: keep the two modes in separate workflows

Use two thin workflow files, each with least privilege, exactly as in the snippets above:

  • review workflowon: pull_request (types [opened, reopened, ready_for_review]; the quick start adds closed purely to cancel in-flight runs) + issue_comment (types [created]); permissions: pull-requests: write, issues: write, contents: write (contents only needed for commit_head_analysis/the webview link).
  • sync workflowon: push (branches [main], with the paths-ignore list) + workflow_dispatch; permissions: contents: write.

The anti-pattern to avoid: one workflow with on: [push, pull_request] and a single union permissions block — it forces every privilege either mode needs onto every trigger. Sync mode soft-skips on pull_request events as a backstop, but don't rely on it: keep the triggers and permissions split so each workflow grants only what its own mode uses.

Be aware that contents: write is repo-wide — GitHub does not scope it to a branch — so the review workflow's webview push permission is itself a write-to-main-capable grant. If that doesn't sit well with your threat model, drop the review workflow to contents: read (you lose only the webview link, not the PR comment).

Inputs

Input Mode Default Description
llm_api_key both required Your LLM provider API key (see llm_provider).
llm_provider both openrouter Provider for the key, mapped to <NAME>_API_KEY (e.g. anthropic, openai, google).
mode both review review posts the PR architecture-diff comment; sync analyzes on push and commits the architecture (analysis.json + rendered docs) to target_branch, keeping it versioned and current.
github_token both ${{ github.token }} Token for GitHub API calls; in review mode it posts or updates the PR comment.
push_token both ${{ github.token }} Token used for pushes: in review mode the generated analysis.json to the PR branch (for the webview link), in sync mode the architecture to target_branch. The workflow token can push when the workflow grants permissions: contents: write. Separate from github_token so commenting can use a GitHub App token while the push uses the workflow token.
engine_ref both v0.12.1 CodeBoarding engine ref. Pin for reproducibility.
depth_level both 2 Analysis depth, 1 to 3. Higher is slower, costlier, and richer; drop to 1 for cheaper runs. Use the same value in your review and sync workflows (why).
render_depth review 1 Display depth for the PR diagram. Keep 1 for a clean top-level view.
diagram_direction review LR Mermaid direction: LR, TD, TB, RL, or BT.
changed_only review false Render only changed components and incident edges.
agent_model both google/gemini-3-flash-preview Analysis model. OpenRouter default shown; other providers use their own engine default.
parsing_model both google/gemini-3.1-flash-lite-preview Parsing model. OpenRouter default shown; other providers use their own engine default.
comment_header review Architecture review Heading for the PR comment.
trigger_command review /codeboarding Slash command for trusted on-demand runs.
cta_base_url review empty Click-proxy base URL: deep-links the editor link into VS Code/Cursor and adds a "get the extension" link (tracks owner/repo/pr). Empty links to the extension listing instead (GitHub strips vscode:/cursor: from comments).
webview_base_url review https://app.codeboarding.org Hosted webview base URL. The PR comment adds an "explore in browser" link to this PR's head-vs-base diff. Needs commit_head_analysis (same-repo PRs only); omitted on forks. Set empty to disable.
commit_head_analysis review true Commit the generated head .codeboarding/analysis.json (+ health report) to the PR branch so the webview can read it at the head SHA. Same-repo PRs only (the token is read-only on forks).
output_dir sync .codeboarding Directory the rendered docs and analysis metadata are committed to. Owned by the action: pre-existing top-level .md files in it are deleted on every run.
output_format sync .md Output format. Only .md is supported.
target_branch sync ${{ github.ref_name }} Branch the generated docs are pushed to.
write_architecture_md sync true Also write docs/development/architecture.md: all rendered pages concatenated, overview.md first.
commit_message sync chore(codeboarding): sync architecture baseline Commit message for the generated docs. No [skip ci] (it would leak through squash-merges); the regen loop is guarded by paths-ignore + the action's own bot-commit check.
force_full sync false Ignore any committed baseline and run a full analysis from scratch. Use to rebuild a stale or corrupt baseline (e.g. from a workflow_dispatch).

Outputs

Output Mode Description
diagram_md review Path to the generated Mermaid markdown block on the runner.
n_changed review Number of changed components, counted recursively.
truncated review true when the graph was reduced to fit GitHub Mermaid limits.
analysis_mode sync full or incremental: whether the run rebuilt the analysis from scratch or reused the committed baseline.
files_written sync The generated files written for the docs commit.
committed sync true when a docs commit was pushed to target_branch; false when sync mode ran but had nothing to commit (or the push failed open). Empty only if sync mode did not run.

Outputs of the mode that did not run are empty strings.

Notes

  • No checkout step is required in your workflow. This action checks out the target (the PR in review mode, the pushed commit in sync mode) and the CodeBoarding engine internally.
  • GitHub withholds secrets from fork PRs on pull_request, so fork runs fail early if an LLM key is unavailable.
  • Do not use pull_request_target for this action. It can expose secrets to PR-head code.
  • GitHub renders Mermaid in strict mode, so node click-through links are not supported in the PR diagram.

Local testing

Fast path, no LLM calls:

scripts/run_local.sh --base-json /tmp/base.json --head-json /tmp/head.json

Full local pipeline:

export OPENROUTER_API_KEY=sk-or-...
scripts/run_local.sh --repo /path/to/repo --base <base-ref> --head <head-ref> \
  --engine /path/to/CodeBoarding

Useful flags:

--depth N
--render-depth N
--direction LR|TD|TB|RL|BT
--changed-only
--no-edge-labels
--out DIR
--no-open

License

MIT. See LICENSE.