Skip to content

Add Harbor adapter for judge v2#51

Closed
xeophon wants to merge 1 commit into
aisa-group:new_judge_v2from
xeophon:feat/harbor-adapter-judge-v2
Closed

Add Harbor adapter for judge v2#51
xeophon wants to merge 1 commit into
aisa-group:new_judge_v2from
xeophon:feat/harbor-adapter-judge-v2

Conversation

@xeophon

@xeophon xeophon commented Jun 3, 2026

Copy link
Copy Markdown

Adds the Harbor adapter changes to the judge-v2 branch.

References/dependencies:

What changed:

  • Adds src/harbor_adapter for generating 28 Harbor tasks across the 7 PostTrainBench benchmarks and 4 base models.
  • Uses Prime driver-aware CUDA images for the agent and separate verifier containers.
  • Ports the verifier script to PR New judging system (v2) #50's judge-v2 flow: canonical judgement_gpt5_4.json, archival judgement_api.json, trace parsing, and reward zeroing on contamination/base-model violations.
  • Defaults verifier OpenAI-compatible calls to Pinference through OPENAI_BASE_URL and CODEX_MODEL; forwards OPENAI_BASE_URL to agent env only for ArenaHard/HealthBench evaluator-judge runs.
  • Updates README wording now that Harbor task generation exists.

I could not push directly to aisa-group:new_judge_v2: GitHub rejected the xeophon token with 403, and PR #50 has maintainerCanModify=false. This PR targets that branch so it can be merged into PR #50.

Tested:

  • bash -n src/harbor_adapter/template/environment/entrypoint.sh src/harbor_adapter/template/environment/system_monitor.sh src/harbor_adapter/template/tests/test.sh
  • uv run --no-project python3 -m py_compile src/harbor_adapter/adapter.py src/harbor_adapter/run_adapter.py
  • git diff --check
  • uv run --no-project python3 src/harbor_adapter/run_adapter.py --benchmark gsm8k --model qwen3-1.7b --output /tmp/ptb-harbor-smoke
  • uv run --no-project python3 src/harbor_adapter/run_adapter.py --benchmark healthbench --model qwen3-1.7b --output /tmp/ptb-harbor-smoke
  • uv run --no-project python3 src/harbor_adapter/run_adapter.py --list

@xeophon xeophon mentioned this pull request Jun 3, 2026
@xeophon

xeophon commented Jun 3, 2026

Copy link
Copy Markdown
Author

ahhhh sorry

@xeophon xeophon closed this Jun 3, 2026
@xeophon xeophon deleted the feat/harbor-adapter-judge-v2 branch June 3, 2026 13:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant