OpenHands · enyst · Feb 24, 2026 · Feb 24, 2026 · Feb 24, 2026 · Feb 24, 2026
diff --git a/.plugin/marketplace.json b/.plugin/marketplace.json
@@ -58,6 +58,20 @@
                 "pull-request"
             ]
         },
+        {
+            "name": "babysit-pr",
+            "source": "./babysit-pr",
+            "description": "Babysit a GitHub pull request by monitoring CI checks, workflow runs, review comments, and mergeability until it is ready to merge.",
+            "category": "productivity",
+            "keywords": [
+                "github",
+                "pull-request",
+                "ci",
+                "actions",
+                "review",
+                "monitoring"
+            ]
+        },
         {
             "name": "bitbucket",
             "source": "./bitbucket",

diff --git a/skills/babysit-pr/README.md b/skills/babysit-pr/README.md
@@ -0,0 +1,19 @@
+# babysit-pr
+
+Babysit a GitHub pull request by monitoring CI checks/workflow runs, review comments, and mergeability until the PR is ready to merge (or merged/closed).
+
+## Triggers
+
+This skill is activated by:
+
+- `/babysit-pr`
+- `/babysit`
+- the agent may activate it if it needs to “babysit PR”, “watch PR”, “monitor CI”, or “check GitHub Actions”
+
+## Details
+
+- Requires the GitHub CLI (`gh`) to be available and authenticated.
+- Uses `scripts/gh_pr_watch.py` to emit one-shot snapshots (`--once`) or a continuous JSONL stream (`--watch`).
+- The watcher can surface review comments from approved review bots by matching keywords in the bot login.
+  - Defaults include: `openhands`, `openhands-agent`, `all-hands-bot`, `smolpaws`, `claude`, `codex`.
+  - Optional: set `BABYSIT_PR_REVIEW_BOT_KEYWORDS` (comma-separated) to allow additional bot keywords.
diff --git a/skills/babysit-pr/SKILL.md b/skills/babysit-pr/SKILL.md
@@ -0,0 +1,217 @@
+---
+name: babysit-pr
+description: Babysit a GitHub pull request by continuously polling CI checks/workflow runs, new review comments, and mergeability state until the PR is ready to merge (or merged/closed). Diagnose failures, retry likely flaky failures up to 3 times, fix/push branch-related issues when appropriate, and stop only when user help is required (e.g., CI infrastructure outages, exhausted flaky retries, permissions, or ambiguous/blocking situations). Use when the user asks to monitor/watch/babysit a PR, watch CI, handle review comments, or keep an eye on mergeability.
+triggers:
+- /babysit-pr
+- /babysit
+---
+
+# PR Babysitter
+
+## Objective
+Babysit a PR persistently until one of these terminal outcomes occurs:
+
+- The PR is merged or closed.
+- CI is successful, there are no unaddressed review comments surfaced by the watcher, required review approval is not blocking merge, and there are no potential merge conflicts (PR is mergeable / not reporting conflict risk).
+- A situation requires user help (for example CI infrastructure issues, repeated flaky failures after retry budget is exhausted, permission problems, or ambiguity that cannot be resolved safely).
+
+Do not stop merely because a single snapshot returns `idle` while checks are still pending.
+
+## Inputs
+Accept any of the following:
+
+- No PR argument: infer the PR from the current branch (`--pr auto`)
+- PR number
+- PR URL
+
+## Core Workflow
+
+1. When the user asks to "monitor"/"watch"/"babysit" a PR, start with the watcher's continuous mode (`--watch`) unless you are intentionally doing a one-shot diagnostic snapshot.
+2. Run the watcher script to snapshot PR/CI/review state (or consume each streamed snapshot from `--watch`).
+3. Inspect the `actions` list in the JSON response.
+4. If `diagnose_ci_failure` is present, inspect failed run logs and classify the failure.
+5. If the failure is likely caused by the current branch, patch code locally, commit, and push.
+6. If `process_review_comment` is present, inspect surfaced review items and decide whether to address them.
+7. If a review item is actionable and correct, patch code locally, commit, and push.
+8. If the failure is likely flaky/unrelated and `retry_failed_checks` is present, rerun failed jobs with `--retry-failed-now`.
+9. If both actionable review feedback and `retry_failed_checks` are present, prioritize review feedback first; a new commit will retrigger CI, so avoid rerunning flaky checks on the old SHA unless you intentionally defer the review change.
+10. On every loop, verify mergeability / merge-conflict status (for example via `gh pr view`) in addition to CI and review state.
+11. After any push or rerun action, immediately return to step 1 and continue polling on the updated SHA/state.
+12. If you had been using `--watch` before pausing to patch/commit/push, relaunch `--watch` yourself in the same turn immediately after the push (do not wait for the user to re-invoke the skill).
+13. Repeat polling until the PR is green + review-clean + mergeable, `stop_pr_closed` appears, or a user-help-required blocker is reached.
+14. Maintain terminal/session ownership: while babysitting is active, keep consuming watcher output in the same turn; do not leave a detached `--watch` process running and then end the turn as if monitoring were complete.
+
+## Commands
+
+### One-shot snapshot
+
+```bash
+python3 <this-skill-path>/scripts/gh_pr_watch.py --pr auto --once
+```
+
+### Continuous watch (JSONL)
+
+```bash
+python3 <this-skill-path>/scripts/gh_pr_watch.py --pr auto --watch
+```
+
+### Trigger flaky retry cycle (only when watcher indicates)
+
+```bash
+python3 <this-skill-path>/scripts/gh_pr_watch.py --pr auto --retry-failed-now
+```
+
+### Explicit PR target
+
+```bash
+python3 <this-skill-path>/scripts/gh_pr_watch.py --pr <number-or-url> --once
+```
+
+## CI Failure Classification
+Use `gh` commands to inspect failed runs before deciding to rerun.
+
+- `gh run view <run-id> --json jobs,name,workflowName,conclusion,status,url,headSha`
+- `gh run view <run-id> --log-failed`
+
+Prefer treating failures as branch-related when logs point to changed code (compile/test/lint/typecheck/snapshots/static analysis in touched areas).
+
+Prefer treating failures as flaky/unrelated when logs show transient infra/external issues (timeouts, runner provisioning failures, registry/network outages, GitHub Actions infra errors).
+
+If classification is ambiguous, perform one manual diagnosis attempt before choosing rerun.
+
+Read `references/heuristics.md` for a concise checklist.
+
+## Review Comment Handling
+The watcher surfaces review items from:
+
+- PR issue comments
+- Inline review comments
+- Review submissions (COMMENT / APPROVED / CHANGES_REQUESTED)
+
+It can also surface feedback from approved review bots (configured in `scripts/gh_pr_watch.py`) in addition to human reviewer feedback. Ignore unrelated bot noise.
+For safety, the watcher only auto-surfaces trusted human review authors (OWNER/MEMBER/COLLABORATOR, plus the authenticated operator) and approved review bots.
+On a fresh watcher state file, existing pending review feedback may be surfaced immediately (not only comments that arrive after monitoring starts). This is intentional so already-open review comments are not missed.
+
+When you agree with a comment and it is actionable:
+
+1. Patch code locally.
+2. Commit with `chore: address PR review feedback (#<n>)`.
+3. Push to the PR head branch.
+4. Resume watching on the new SHA immediately (do not stop after reporting the push).
+5. If monitoring was running in `--watch` mode, restart `--watch` immediately after the push in the same turn; do not wait for the user to ask again.
+
+If you disagree or the comment is non-actionable/already addressed, record it as handled by continuing the watcher loop (the script de-duplicates surfaced items via state after surfacing them).
+If a code review comment/thread is already marked as resolved in GitHub, treat it as non-actionable and safely ignore it unless new unresolved follow-up feedback appears.
+
+## Optional: Request (Re-)Review
+
+If the PR is green/mergeable but blocked on approval (for example `reviewDecision` is `REVIEW_REQUIRED` or `CHANGES_REQUESTED`) and you believe you’ve addressed all surfaced feedback, you can request another look.
+
+Rules:
+
+- Only do this when the user explicitly asks you to request review / ping reviewers, or after confirming with the user (avoid spamming humans).
+- Prefer requesting review only once per new head SHA.
+- If permissions fail or it’s unclear who should review, stop and ask the user.
+
+Suggested flow:
+
+1. Leave a brief PR comment summarizing what changed and why re-review is needed.
+   ```bash
+   gh pr comment <pr> --body "Addressed the requested changes in <sha>. Could you take another look?"
+   ```
+   Do NOT tag humans.
+2. Re-request reviewers via the GitHub API.
+   ```bash
+   gh api -X POST repos/<owner>/<repo>/pulls/<pr_number>/requested_reviewers \
+     -f reviewers[]=reviewer1 \
+     -f reviewers[]=reviewer2
+   # For team reviewers:
+   #   -f team_reviewers[]=my-team
+   ```
+
+If the API returns an error indicating reviewers are already requested, treat it as non-fatal and continue monitoring.
+
+
+## Git Safety Rules
+
+- Work only on the PR head branch.
+- Avoid destructive git commands.
+- Do not switch branches unless necessary to recover context.
+- Before editing, check for unrelated uncommitted changes. If present, stop and ask the user.
+- After each successful fix, commit and `git push`, then re-run the watcher.
+- If you interrupted a live `--watch` session to make the fix, restart `--watch` immediately after the push in the same turn.
+- Do not run multiple concurrent `--watch` processes for the same PR/state file; keep one watcher session active and reuse it until it stops or you intentionally restart it.
+- A push is not a terminal outcome; continue the monitoring loop unless a strict stop condition is met.
+
+Commit message defaults:
+
+- `fix: CI failure on PR #<n>`
+- `chore: address PR review feedback (#<n>)`
+
+## Monitoring Loop Pattern
+Use this loop in a live OpenHands session:
+
+1. Run `--once`.
+2. Read `actions`.
+3. First check whether the PR is now merged or otherwise closed; if so, report that terminal state and stop polling immediately.
+4. Check CI summary, new review items, and mergeability/conflict status.
+5. Diagnose CI failures and classify branch-related vs flaky/unrelated.
+6. Process actionable review comments before flaky reruns when both are present; if a review fix requires a commit, push it and skip rerunning failed checks on the old SHA.
+7. Retry failed checks only when `retry_failed_checks` is present and you are not about to replace the current SHA with a review/CI fix commit.
+8. If you pushed a commit or triggered a rerun, report the action briefly and continue polling (do not stop).
+9. After a review-fix push, proactively restart continuous monitoring (`--watch`) in the same turn unless a strict stop condition has already been reached.
+10. If everything is passing, mergeable, not blocked on required review approval, and there are no unaddressed review items, report success and stop.
+11. If blocked on a user-help-required issue (infra outage, exhausted flaky retries, unclear reviewer request, permissions), report the blocker and stop.
+12. Otherwise sleep according to the polling cadence below and repeat.
+
+When the user explicitly asks to monitor/watch/babysit a PR, prefer `--watch` so polling continues autonomously in one command. Use repeated `--once` snapshots only for debugging, local testing, or when the user explicitly asks for a one-shot check.
+Do not stop to ask the user whether to continue polling; continue autonomously until a strict stop condition is met or the user explicitly interrupts.
+Do not hand control back to the user after a review-fix push just because a new SHA was created; restarting the watcher and re-entering the poll loop is part of the same babysitting task.
+If a `--watch` process is still running and no strict stop condition has been reached, the babysitting task is still in progress; keep streaming/consuming watcher output instead of ending the turn.
+
+## Polling Cadence
+Use adaptive polling and continue monitoring even after CI turns green:
+
+- While CI is not green (pending/running/queued or failing): poll every 1 minute.
+- After CI turns green: start at every 1 minute, then back off exponentially when there is no change (for example 1m, 2m, 4m, 8m, 16m, 32m), capping at every 1 hour.
+- Reset the green-state polling interval back to 1 minute whenever anything changes (new commit/SHA, check status changes, new review comments, mergeability changes, review decision changes).
+- If CI stops being green again (new commit, rerun, or regression): return to 1-minute polling.
+- If any poll shows the PR is merged or otherwise closed: stop polling immediately and report the terminal state.
+
+## Stop Conditions (Strict)
+Stop only when one of the following is true:
+
+- PR merged or closed (stop as soon as a poll/snapshot confirms this).
+- PR is ready to merge: CI succeeded, no surfaced unaddressed review comments, not blocked on required review approval, and no merge conflict risk.
+- User intervention is required and the agent cannot safely proceed alone.
+
+Keep polling when:
+
+- `actions` contains only `idle` but checks are still pending.
+- CI is still running/queued.
+- Review state is quiet but CI is not terminal.
+- CI is green but mergeability is unknown/pending.
+- CI is green and mergeable, but the PR is still open and you are waiting for possible new review comments or merge-conflict changes per the green-state cadence.
+- The PR is green but blocked on review approval (`REVIEW_REQUIRED` / similar); continue polling on the green-state cadence and surface any new review comments without asking for confirmation to keep watching.
+
+## Output Expectations
+Provide concise progress updates while monitoring and a final summary that includes:
+
+- During long unchanged monitoring periods, avoid emitting a full update on every poll; summarize only status changes plus occasional heartbeat updates.
+- Treat push confirmations, intermediate CI snapshots, and review-action updates as progress updates only; do not emit the final summary or end the babysitting session unless a strict stop condition is met.
+- A user request to "monitor" is not satisfied by a couple of sample polls; remain in the loop until a strict stop condition or an explicit user interruption.
+- A review-fix commit + push is not a completion event; immediately resume live monitoring (`--watch`) in the same turn and continue reporting progress updates.
+- When CI first transitions to all green for the current SHA, emit a one-time celebratory progress update (do not repeat it on every green poll). Preferred style: `🚀 CI is all green! 33/33 passed. Still on watch for review approval.`
+- Do not send the final summary while a watcher terminal is still running unless the watcher has emitted/confirmed a strict stop condition; otherwise continue with progress updates.
+
+- Final PR SHA
+- CI status summary
+- Mergeability / conflict status
+- Fixes pushed
+- Flaky retry cycles used
+- Remaining unresolved failures or review comments
+
+## References
+
+- Heuristics and decision tree: `references/heuristics.md`
+- GitHub CLI/API details used by the watcher: `references/github-api-notes.md`
diff --git a/skills/babysit-pr/references/github-api-notes.md b/skills/babysit-pr/references/github-api-notes.md
@@ -0,0 +1,72 @@
+# GitHub CLI / API Notes For `babysit-pr`
+
+## Primary commands used
+
+### PR metadata
+
+- `gh pr view --json number,url,state,mergedAt,closedAt,headRefName,headRefOid,headRepository,headRepositoryOwner`
+
+Used to resolve PR number, URL, branch, head SHA, and closed/merged state.
+
+### PR checks summary
+
+- `gh pr checks --json name,state,bucket,link,workflow,event,startedAt,completedAt`
+
+Used to compute pending/failed/passed counts and whether the current CI round is terminal.
+
+### Workflow runs for head SHA
+
+- `gh api repos/{owner}/{repo}/actions/runs -X GET -f head_sha=<sha> -f per_page=100`
+
+Used to discover failed workflow runs and rerunnable run IDs.
+
+### Failed log inspection
+
+- `gh run view <run-id> --json jobs,name,workflowName,conclusion,status,url,headSha`
+- `gh run view <run-id> --log-failed`
+
+Used by the agent to classify branch-related vs flaky/unrelated failures.
+
+### Retry failed jobs only
+
+- `gh run rerun <run-id> --failed`
+
+Reruns only failed jobs (and dependencies) for a workflow run.
+
+## Review-related endpoints
+
+- Issue comments on PR:
+  - `gh api repos/{owner}/{repo}/issues/<pr_number>/comments?per_page=100`
+- Inline PR review comments:
+  - `gh api repos/{owner}/{repo}/pulls/<pr_number>/comments?per_page=100`
+- Review submissions:
+  - `gh api repos/{owner}/{repo}/pulls/<pr_number>/reviews?per_page=100`
+
+## JSON fields consumed by the watcher
+
+### `gh pr view`
+
+- `number`
+- `url`
+- `state`
+- `mergedAt`
+- `closedAt`
+- `headRefName`
+- `headRefOid`
+
+### `gh pr checks`
+
+- `bucket` (`pass`, `fail`, `pending`, `skipping`)
+- `state`
+- `name`
+- `workflow`
+- `link`
+
+### Actions runs API (`workflow_runs[]`)
+
+- `id`
+- `name`
+- `status`
+- `conclusion`
+- `html_url`
+- `head_sha`
diff --git a/skills/babysit-pr/references/heuristics.md b/skills/babysit-pr/references/heuristics.md
@@ -0,0 +1,58 @@
+# CI / Review Heuristics
+
+## CI classification checklist
+
+Treat as **branch-related** when logs clearly indicate a regression caused by the PR branch:
+
+- Compile/typecheck/lint failures in files or modules touched by the branch
+- Deterministic unit/integration test failures in changed areas
+- Snapshot output changes caused by UI/text changes in the branch
+- Static analysis violations introduced by the latest push
+- Build script/config changes in the PR causing a deterministic failure
+
+Treat as **likely flaky or unrelated** when evidence points to transient or external issues:
+
+- DNS/network/registry timeout errors while fetching dependencies
+- Runner image provisioning or startup failures
+- GitHub Actions infrastructure/service outages
+- Cloud/service rate limits or transient API outages
+- Non-deterministic failures in unrelated integration tests with known flake patterns
+
+If uncertain, inspect failed logs once before choosing rerun.
+
+## Decision tree (fix vs rerun vs stop)
+
+1. If PR is merged/closed: stop.
+2. If there are failed checks:
+   - Diagnose first.
+   - If branch-related: fix locally, commit, push.
+   - If likely flaky/unrelated and all checks for the current SHA are terminal: rerun failed jobs.
+   - If checks are still pending: wait.
+3. If flaky reruns for the same SHA reach the configured limit (default 3): stop and report persistent failure.
+4. Independently, process any new human review comments.
+
+## Review comment agreement criteria
+
+Address the comment when:
+
+- The comment is technically correct.
+- The change is actionable in the current branch.
+- The requested change does not conflict with the user’s intent or recent guidance.
+- The change can be made safely without unrelated refactors.
+
+Do not auto-fix when:
+
+- The comment is ambiguous and needs clarification.
+- The request conflicts with explicit user instructions.
+- The proposed change requires product/design decisions the user has not made.
+- The codebase is in a dirty/unrelated state that makes safe editing uncertain.
+
+## Stop-and-ask conditions
+
+Stop and ask the user instead of continuing automatically when:
+
+- The local worktree has unrelated uncommitted changes.
+- `gh` auth/permissions fail.
+- The PR branch cannot be pushed.
+- CI failures persist after the flaky retry budget.
+- Reviewer feedback requires a product decision or cross-team coordination.