[Harbor 4/4] architecture docs, tutorial, and the GAIA example by varunursekar · Pull Request #6 · scaleapi/vero

varunursekar · 2026-06-24T18:12:47Z

Draft · Stack 4 of 4 — targets harbor-3-compiler. Additive, low-risk.

docs/harbor/architecture.md — what it is, the compiled-task topology, the two modes, the component map, and the leaderboard-integrity model.
docs/harbor/tutorial.md — build + run end to end (both modes, the agent-side protocol); README Harbor section.
examples/gaia-optimization — a Mode-B example optimizing a GaiaAgent (thin Terminus2 subclass with an editable prompt) on gaia/gaia via a nested harbor run on Modal.

Start your reading here for the big picture, then dive into [1/4]–[3/4].

Stack: [1/4] core → [2/4] sidecar → [3/4] compiler → this.

🤖 Generated with Claude Code

Greptile Summary

This PR is the final stack entry (4/4) for the Harbor integration, adding architecture docs, a tutorial, and a runnable Mode-B example (gaia-optimization) that optimizes a GaiaAgent prompt against real GAIA tasks via a nested harbor run on Modal.

Documentation (docs/harbor/architecture.md, docs/harbor/tutorial.md): covers the compiled-task topology, the two evaluation modes (A = vero scores, B = nested Harbor run scores), the trust boundary / leaderboard-integrity model, and the full CLI walkthrough end to end.
GAIA example (examples/gaia-optimization): a thin Terminus2 subclass that redirects the prompt-template path to an editable prompts/ directory, a build.yaml wiring up Mode B on Modal, and the copied prompt templates that form the optimization surface.
README update: adds a Harbor integration section with a quick-start snippet and links to the new docs.

Confidence Score: 4/5

Entirely additive — new docs and an example package with no changes to vero core; safe to merge.

The changes are documentation and an example that adds no new runtime paths to vero itself. The two issues found are minor: typos in the XML prompt template (which is the optimization surface — an optimizer would fix them during a run anyway) and a potential @staticmethod vs instance-method mismatch on version() in GaiaAgent that could surface only if Harbor calls GaiaAgent.version() as a class-level static.

src/gaia_agent/agent.py and src/gaia_agent/prompts/terminus-xml-plain.txt have the two flagged issues; all other files are clean.

Important Files Changed

Filename	Overview
vero/README.md	Adds a Harbor integration section with install snippet and links to docs/examples; purely additive and accurate.
vero/docs/harbor/architecture.md	New architecture doc covering the compiled-task topology, two modes (A/B), trust boundary, and component map; thorough and internally consistent.
vero/docs/harbor/tutorial.md	New tutorial doc showing both modes end to end, the agent-side protocol, and how to inspect runs; matches the architecture doc.
vero/examples/gaia-optimization/README.md	Clear example README covering prerequisites, run instructions, caveats, and attribution for the copied prompt files.
vero/examples/gaia-optimization/build.yaml	Mode-B build config with sensible budget (3 train evals), correct split visibility tiers, and well-commented placeholder task IDs.
vero/examples/gaia-optimization/pyproject.toml	Minimal package config; force-include correctly bundles the editable prompts directory into the wheel so agent.py's file-relative path resolves after install.
vero/examples/gaia-optimization/src/gaia_agent/agent.py	Thin Terminus2 subclass redirecting the prompt-template path; accesses private _parser_name and has a version()/name() static/instance inconsistency worth aligning with the base class.
vero/examples/gaia-optimization/src/gaia_agent/prompts/terminus-json-plain.txt	JSON prompt template copied from Harbor's terminus_2; correctly uses double-braced literals and single-braced format variables; no issues.
vero/examples/gaia-optimization/src/gaia_agent/prompts/terminus-xml-plain.txt	XML prompt template with two typos ("apprpriate" and "In is always possible") that will appear verbatim in every agent call.

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant Dev as Developer
    participant VeroHarbor as vero harbor CLI
    participant Main as main container (optimizer)
    participant Sidecar as eval-sidecar (vero harbor serve)
    participant Modal as Modal (nested harbor run)
    participant Verifier as tests/test.sh (shared verifier)

    Dev->>VeroHarbor: vero harbor build -c build.yaml -o /tmp/task
    VeroHarbor-->>Dev: Harbor task dir (compose + Dockerfiles + instruction.md)

    Dev->>Main: harbor run -p /tmp/task -a claude-code -e docker
    activate Main
    activate Sidecar
    Note over Sidecar: vero harbor serve starts, writes per-trial admin token (root:600)

    Main->>Main: optimizer edits prompts/, commits
    Main->>Sidecar: "POST /eval?split=train"
    Sidecar->>Sidecar: git fetch commit (file://, hooks disabled)
    Sidecar->>Modal: harbor run GaiaAgent on train tasks
    Modal-->>Sidecar: per-task verifier rewards
    Sidecar-->>Main: aggregate score + remaining budget

    Note over Main: repeat edits + evals within budget

    Main->>Verifier: trial end — tests/test.sh runs
    Verifier->>Sidecar: POST /finalize (admin token)
    Sidecar->>Sidecar: select best train commit
    Sidecar->>Modal: harbor run on hidden validation tasks
    Modal-->>Sidecar: accuracy
    Sidecar-->>Verifier: reward.json
    deactivate Sidecar
    deactivate Main

%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant Dev as Developer
    participant VeroHarbor as vero harbor CLI
    participant Main as main container (optimizer)
    participant Sidecar as eval-sidecar (vero harbor serve)
    participant Modal as Modal (nested harbor run)
    participant Verifier as tests/test.sh (shared verifier)

    Dev->>VeroHarbor: vero harbor build -c build.yaml -o /tmp/task
    VeroHarbor-->>Dev: Harbor task dir (compose + Dockerfiles + instruction.md)

    Dev->>Main: harbor run -p /tmp/task -a claude-code -e docker
    activate Main
    activate Sidecar
    Note over Sidecar: vero harbor serve starts, writes per-trial admin token (root:600)

    Main->>Main: optimizer edits prompts/, commits
    Main->>Sidecar: "POST /eval?split=train"
    Sidecar->>Sidecar: git fetch commit (file://, hooks disabled)
    Sidecar->>Modal: harbor run GaiaAgent on train tasks
    Modal-->>Sidecar: per-task verifier rewards
    Sidecar-->>Main: aggregate score + remaining budget

    Note over Main: repeat edits + evals within budget

    Main->>Verifier: trial end — tests/test.sh runs
    Verifier->>Sidecar: POST /finalize (admin token)
    Sidecar->>Sidecar: select best train commit
    Sidecar->>Modal: harbor run on hidden validation tasks
    Modal-->>Sidecar: accuracy
    Sidecar-->>Verifier: reward.json
    deactivate Sidecar
    deactivate Main

Prompt To Fix All With AI

Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 2
vero/examples/gaia-optimization/src/gaia_agent/prompts/terminus-xml-plain.txt:26-28
Two typos on this line: "apprpriate" should be "appropriate", and directly after on the next line "In is always possible" should be "It is always possible". Since this file is the LLM prompt template served verbatim to the inner GAIA agent, these errors appear in every agent turn and could subtly degrade instruction-following.

```suggestion
The `duration` attribute of <keystrokes> specifies the number of seconds to wait for the command to complete (default: 1.0) before the next command will be executed. On immediate tasks (e.g., cd, ls, echo, cat) set a duration of 0.1 seconds. On commands (e.g., gcc, find, rustc) set a duration of 1.0 seconds. On slow commands (e.g., make, python3 [long running script], wget [file]) set an appropriate duration as you determine necessary.

It is better to set a smaller duration than a longer duration. It is always possible to wait again if the prior output has not finished, by running <keystrokes duration="10.0"></keystrokes> on subsequent requests to wait longer. Never wait longer than 60 seconds; prefer to poll to see intermediate result status.
```

### Issue 2 of 2
vero/examples/gaia-optimization/src/gaia_agent/agent.py:30-31
`version()` signature mismatch with `name()`

`name()` is declared as a `@staticmethod` but `version()` is an instance method. If `Terminus2` defines `version()` as a `@staticmethod` (the typical pattern when `name()` is also static), then `GaiaAgent.version()` won't properly override it when called as `GaiaAgent.version()` on the class rather than on an instance — the base class static will shadow it. This is worth aligning with however `Terminus2` declares `version()`.

_{Reviews (1): Last reviewed commit: "Harbor: architecture docs, tutorial, and..." | Re-trigger Greptile}

Greptile also left 1 inline comment on this PR.

- docs/harbor/architecture.md — what the integration is, the compiled-task topology, the two evaluation modes, the component map, and the leaderboard-integrity model. - docs/harbor/tutorial.md — build and run an optimization task end to end (both modes, the agent-side protocol), and a Harbor section in the README. - examples/gaia-optimization — a Mode-B example optimizing a GaiaAgent (a thin Terminus2 subclass with an editable prompt) on gaia/gaia via a nested harbor run on Modal. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

greptile-apps · 2026-06-24T18:31:40Z

+    def version(self) -> str:
+        return "0.1.0"


version() signature mismatch with name()

name() is declared as a @staticmethod but version() is an instance method. If Terminus2 defines version() as a @staticmethod (the typical pattern when name() is also static), then GaiaAgent.version() won't properly override it when called as GaiaAgent.version() on the class rather than on an instance — the base class static will shadow it. This is worth aligning with however Terminus2 declares version().

Prompt To Fix With AI

This is a comment left during a code review. Path: vero/examples/gaia-optimization/src/gaia_agent/agent.py Line: 30-31 Comment: `version()` signature mismatch with `name()` `name()` is declared as a `@staticmethod` but `version()` is an instance method. If `Terminus2` defines `version()` as a `@staticmethod` (the typical pattern when `name()` is also static), then `GaiaAgent.version()` won't properly override it when called as `GaiaAgent.version()` on the class rather than on an instance — the base class static will shadow it. This is worth aligning with however `Terminus2` declares `version()`. How can I resolve this? If you propose a fix, please make it concise.

varunursekar mentioned this pull request Jun 24, 2026

Add Harbor integration: optimization-as-a-Harbor-task #2

Closed

varunursekar requested a review from a team June 24, 2026 18:18

varunursekar marked this pull request as ready for review June 24, 2026 18:22

greptile-apps Bot reviewed Jun 24, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Harbor 4/4] architecture docs, tutorial, and the GAIA example#6

[Harbor 4/4] architecture docs, tutorial, and the GAIA example#6
varunursekar wants to merge 1 commit into
harbor-3-compilerfrom
harbor-4-docs

varunursekar commented Jun 24, 2026 •

edited by greptile-apps Bot

Loading

Uh oh!

greptile-apps Bot Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

varunursekar commented Jun 24, 2026 • edited by greptile-apps Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps Bot Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

varunursekar commented Jun 24, 2026 •

edited by greptile-apps Bot

Loading