Skip to content

feat(cli): add reconnect command#960

Open
WuKongAI-CMU wants to merge 4 commits intoNVIDIA:mainfrom
WuKongAI-CMU:fix/910-reconnect
Open

feat(cli): add reconnect command#960
WuKongAI-CMU wants to merge 4 commits intoNVIDIA:mainfrom
WuKongAI-CMU:fix/910-reconnect

Conversation

@WuKongAI-CMU
Copy link
Contributor

@WuKongAI-CMU WuKongAI-CMU commented Mar 26, 2026

Summary

  • add a global nemoclaw reconnect [name] command for post-reboot recovery
  • reuse the existing sandbox connect and gateway recovery path instead of introducing a second reconnect state machine
  • add CLI coverage for default-sandbox reconnect and explicit-name reconnect dispatch

Closes #910

Summary by CodeRabbit

  • New Features

    • Added a reconnect command to restore a sandbox connection; accepts an optional sandbox name and falls back to the configured default. Help updated to document usage and argument limits.
  • Bug Fixes / Reliability

    • Improved gateway recovery: if the gateway runtime is missing or unhealthy, the CLI will select and start the gateway before reconnecting.
    • Added validation and clear error messages for unknown or invalid sandbox names and for too many arguments.
  • Tests

    • Added end-to-end CLI tests covering default, named, invalid-name, argument-count, and missing-gateway reconnect scenarios.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 26, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: ab9d5cde-123e-45e6-8b35-16fcf58005cf

📥 Commits

Reviewing files that changed from the base of the PR and between 385e921 and d0bb41a.

📒 Files selected for processing (2)
  • bin/nemoclaw.js
  • test/cli.test.js
🚧 Files skipped from review as they are similar to previous changes (1)
  • bin/nemoclaw.js

📝 Walkthrough

Walkthrough

Adds a new nemoclaw reconnect CLI command that resolves a target sandbox (explicit arg or registry default), validates it, recovers a missing/unhealthy gateway when detected, and then invokes the existing sandbox connection flow; help text and tests were updated.

Changes

Cohort / File(s) Summary
Core CLI
bin/nemoclaw.js
Added reconnect command and handlers: resolveReconnectSandboxName(requestedName) and async reconnect(args=[]); updated global dispatch and help() to document nemoclaw reconnect [name]; gateway recovery predicate extended to treat missing_named as recoverable.
Tests
test/cli.test.js
Added CLI tests asserting help includes reconnect, end-to-end reconnect scenarios for default and explicit sandbox names, gateway-recovery path (gateway select/start when runtime missing), and validation failures for unknown name / too many args. Stubs verify ordered openshell calls and exit codes.

Sequence Diagram

sequenceDiagram
    participant User
    participant CLI as "nemoclaw CLI"
    participant Registry as "Registry (~/.nemoclaw/sandboxes.json)"
    participant Openshell as "openshell"
    participant Gateway as "Gateway runtime"

    User->>CLI: nemoclaw reconnect [name]
    CLI->>Registry: resolveReconnectSandboxName(name)
    alt name provided
        Registry-->>CLI: requested name
    else no name
        Registry-->>CLI: defaultSandbox (or error)
    end
    alt gateway missing or unhealthy (including missing_named)
        CLI->>Openshell: gateway select nemoclaw
        CLI->>Openshell: gateway start --name nemoclaw
        Openshell-->>Gateway: start request
        Gateway-->>CLI: started/healthy
    end
    CLI->>Openshell: sandbox get <name>
    Openshell-->>CLI: sandbox info/state
    CLI->>Openshell: forward start --background 18789 <name>
    Openshell-->>CLI: forward started
    CLI->>Openshell: sandbox connect <name>
    Openshell-->>User: interactive session
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Poem

🐇 I sniff the bytes and tap the key,

One hop — the sandbox wakes for me.
Reconnect done with gentle thump,
Tunnels hum and kernels jump.
Hooray — back in the burrow, snug and plump!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 42.86% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat(cli): add reconnect command' clearly and directly summarizes the primary change: addition of a new CLI reconnect command.
Linked Issues check ✅ Passed Code changes fully implement issue #910 requirements: verify container runtime, start gateway if needed (including missing_named recovery), wait for health, resolve sandbox (default/explicit), and reuse sandboxConnect flow.
Out of Scope Changes check ✅ Passed All code changes are directly in scope: new reconnect command, gateway recovery logic for missing_named state, sandbox name resolution, and comprehensive CLI test coverage for the feature.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@WuKongAI-CMU
Copy link
Contributor Author

@coderabbitai review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 26, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
test/cli.test.js (1)

160-211: Add forwarding assertion here too for flow parity.

This test validates sandbox get and sandbox connect, but unlike the default-path test it does not assert forward start --background 18789 beta. Adding it will better guard reconnect regressions for explicit-name dispatch.

✅ Suggested assertion
     expect(r.code).toBe(0);
     expect(r.out.includes("Reconnecting to sandbox 'beta'")).toBeTruthy();
     const log = fs.readFileSync(markerFile, "utf8");
     expect(log).toContain("sandbox get beta");
+    expect(log).toContain("forward start --background 18789 beta");
     expect(log).toContain("sandbox connect beta");
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/cli.test.js` around lines 160 - 211, The test "reconnect accepts an
explicit sandbox name" is missing the forwarding assertion present in the
default-path test; after reading the markerFile (variable markerFile) add an
assertion that the recorded command log includes the forward invocation for the
named sandbox — specifically assert markerFile contents contain "forward start
--background 18789 beta" (or at minimum that it contains "forward start" and
"beta" together) to ensure the openshell script received the forwarding command
when runWithEnv("reconnect beta") triggers sandbox connect; place this check
alongside the existing expectations that assert "sandbox get beta" and "sandbox
connect beta".
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@bin/nemoclaw.js`:
- Around line 582-586: reconnect currently just resolves sandboxName and calls
sandboxConnect, which can skip gateway auto-start because
recoverNamedGatewayRuntime handles only named_unhealthy, named_unreachable, and
connected_other but not missing_named; update the logic so that reconnect
triggers the recovery path for a missing named gateway: either call
recoverNamedGatewayRuntime(sandboxName) (or extend recoverNamedGatewayRuntime to
treat missing_named the same as the other cases) before invoking sandboxConnect,
and ensure the code paths for named_unhealthy, named_unreachable,
connected_other, and missing_named will attempt to auto-start the gateway so
reconnect does not leave the system without an active named gateway.

---

Nitpick comments:
In `@test/cli.test.js`:
- Around line 160-211: The test "reconnect accepts an explicit sandbox name" is
missing the forwarding assertion present in the default-path test; after reading
the markerFile (variable markerFile) add an assertion that the recorded command
log includes the forward invocation for the named sandbox — specifically assert
markerFile contents contain "forward start --background 18789 beta" (or at
minimum that it contains "forward start" and "beta" together) to ensure the
openshell script received the forwarding command when runWithEnv("reconnect
beta") triggers sandbox connect; place this check alongside the existing
expectations that assert "sandbox get beta" and "sandbox connect beta".
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: f93a3873-981a-4c35-8b40-3aa583486b78

📥 Commits

Reviewing files that changed from the base of the PR and between f0f53e4 and ca512aa.

📒 Files selected for processing (2)
  • bin/nemoclaw.js
  • test/cli.test.js

@WuKongAI-CMU
Copy link
Contributor Author

Addressed the current review in 41b383b:

  • include missing_named in the named-gateway recovery start path so nemoclaw reconnect can recover when the named gateway is gone after reboot
  • add regression coverage for reconnecting through that missing-gateway recovery path
  • add the explicit forward assertion to the named-sandbox reconnect test

@coderabbitai review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 26, 2026

@WuKongAI-CMU Thanks for the update! I'll kick off a fresh review of the changes in 41b383b, including the missing_named gateway recovery path fix and the new regression coverage.

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
test/cli.test.js (2)

152-157: Consider asserting command order, not only presence.

These checks currently use toContain, which won’t catch sequence regressions (e.g., connect before forward start). Since reconnect orchestration order matters, index-based order assertions would make these tests more robust.

✅ Example order assertion pattern
     const log = fs.readFileSync(markerFile, "utf8");
-    expect(log).toContain("sandbox get alpha");
-    expect(log).toContain("forward start --background 18789 alpha");
-    expect(log).toContain("sandbox connect alpha");
+    const iGet = log.indexOf("sandbox get alpha");
+    const iForward = log.indexOf("forward start --background 18789 alpha");
+    const iConnect = log.indexOf("sandbox connect alpha");
+    expect(iGet).toBeGreaterThanOrEqual(0);
+    expect(iForward).toBeGreaterThan(iGet);
+    expect(iConnect).toBeGreaterThan(iForward);

Apply the same pattern to the named and missing-gateway reconnect tests (including gateway select before gateway start).

Also applies to: 206-211, 299-304

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/cli.test.js` around lines 152 - 157, Replace the loose toContain
assertions on markerFile/log in the reconnect tests with explicit order checks
using indexOf: compute indices for the expected strings (e.g., "sandbox get
alpha", "forward start --background 18789 alpha", "sandbox connect alpha") and
assert indexOf(first) !== -1 and indexOf(first) < indexOf(second) <
indexOf(third); do the same in the named and missing-gateway reconnect tests to
assert "gateway select" appears before "gateway start" (and any other ordered
steps), referencing the markerFile/log and the specific strings used in the
existing expectations.

169-180: Strengthen explicit-name test to catch fallback regressions.

Line 179 currently sets defaultSandbox to "beta", the same value passed in Line 201. If reconnect accidentally ignores explicit args and always uses default, this test would still pass.

🧪 Proposed test hardening
       JSON.stringify({
         sandboxes: {
+          alpha: {
+            name: "alpha",
+            model: "test-model",
+            provider: "nvidia-prod",
+            gpuEnabled: false,
+            policies: [],
+          },
           beta: {
             name: "beta",
             model: "test-model",
             provider: "nvidia-prod",
             gpuEnabled: false,
             policies: [],
           },
         },
-        defaultSandbox: "beta",
+        defaultSandbox: "alpha",
       }),

Also applies to: 201-211

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/cli.test.js` around lines 169 - 180, The test currently uses the same
value for defaultSandbox and the explicit sandbox arg, so update the fixture(s)
used in the "explicit-name" test to use a different defaultSandbox value (e.g.,
set defaultSandbox to "alpha" while leaving the explicit arg/sandbox name as
"beta") to ensure reconnect honors the explicit argument; update both the JSON
fixture at the sandboxes block (the object containing defaultSandbox) and the
similar fixture used around lines 201-211 so default and explicit names differ,
preserving existing names like "beta" for the explicit input.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@test/cli.test.js`:
- Around line 152-157: Replace the loose toContain assertions on markerFile/log
in the reconnect tests with explicit order checks using indexOf: compute indices
for the expected strings (e.g., "sandbox get alpha", "forward start --background
18789 alpha", "sandbox connect alpha") and assert indexOf(first) !== -1 and
indexOf(first) < indexOf(second) < indexOf(third); do the same in the named and
missing-gateway reconnect tests to assert "gateway select" appears before
"gateway start" (and any other ordered steps), referencing the markerFile/log
and the specific strings used in the existing expectations.
- Around line 169-180: The test currently uses the same value for defaultSandbox
and the explicit sandbox arg, so update the fixture(s) used in the
"explicit-name" test to use a different defaultSandbox value (e.g., set
defaultSandbox to "alpha" while leaving the explicit arg/sandbox name as "beta")
to ensure reconnect honors the explicit argument; update both the JSON fixture
at the sandboxes block (the object containing defaultSandbox) and the similar
fixture used around lines 201-211 so default and explicit names differ,
preserving existing names like "beta" for the explicit input.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 224a0397-d031-4b60-9240-87c77422bf7e

📥 Commits

Reviewing files that changed from the base of the PR and between ca512aa and 41b383b.

📒 Files selected for processing (2)
  • bin/nemoclaw.js
  • test/cli.test.js
🚧 Files skipped from review as they are similar to previous changes (1)
  • bin/nemoclaw.js

@WuKongAI-CMU
Copy link
Contributor Author

Rebased this branch onto the latest main and pushed follow-up commit 385e921.

Changes in this pass:

  • resolved the bin/nemoclaw.js rebase conflict against current main
  • kept missing_named in the named-gateway recovery start path so nemoclaw reconnect can recover after the named gateway disappears
  • hardened the reconnect tests so they assert command order (get -> forward -> connect, and select -> start -> get -> forward -> connect for the missing-gateway path)
  • changed the explicit-name reconnect fixture so the default sandbox differs from the requested sandbox, preventing fallback regressions from passing silently

@coderabbitai review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 26, 2026

@WuKongAI-CMU Thanks for the detailed summary! I'll kick off a fresh review of 385e921, covering the rebase resolution, the retained missing_named gateway recovery path, the hardened command-order assertions in the reconnect tests, and the improved explicit-name fixture that guards against silent fallback regressions.

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
test/cli.test.js (1)

106-323: Consider extracting a shared reconnect test fixture helper.

The repeated temp-home/registry/stub setup is sizable; a small helper would reduce duplication and make future reconnect scenarios easier to add.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/cli.test.js` around lines 106 - 323, Extract the repeated
temp-home/registry/stub setup in the three reconnect tests into a helper (e.g.,
setupReconnectFixture) that creates HOME, localBin, registryDir, markerFile (and
optionally stateFile), writes sandboxes.json and creates the openshell stub;
have it accept parameters for sandbox list, defaultSandbox, openshell behavior,
and initial state so tests can call setupReconnectFixture(...) and then call
runWithEnv("reconnect ..." , { HOME, PATH: `${localBin}:${process.env.PATH ||
""}` }, timeout). Update the tests to use this helper and assert using the
returned markerFile and paths instead of duplicating
fs.mkdirSync/fs.writeFileSync code; keep runWithEnv and the existing expectation
checks unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@bin/nemoclaw.js`:
- Around line 560-568: The reconnect argument validation is too permissive:
resolveReconnectSandboxName accepts any syntactically valid name and ignores
extra positional args, which can lead to confusing downstream errors; update
resolveReconnectSandboxName to (1) reject when more than one positional argument
is provided, logging a clear error and exiting, and (2) if an explicit name is
given, verify it exists in the local registry (via registry.has or equivalent)
before proceeding—if the name is unknown, log a specific "unknown sandbox" error
and exit; apply the same stricter checks to the corresponding reconnect handler
for connectors (the analogous function around the other block at lines 584-588).

---

Nitpick comments:
In `@test/cli.test.js`:
- Around line 106-323: Extract the repeated temp-home/registry/stub setup in the
three reconnect tests into a helper (e.g., setupReconnectFixture) that creates
HOME, localBin, registryDir, markerFile (and optionally stateFile), writes
sandboxes.json and creates the openshell stub; have it accept parameters for
sandbox list, defaultSandbox, openshell behavior, and initial state so tests can
call setupReconnectFixture(...) and then call runWithEnv("reconnect ..." , {
HOME, PATH: `${localBin}:${process.env.PATH || ""}` }, timeout). Update the
tests to use this helper and assert using the returned markerFile and paths
instead of duplicating fs.mkdirSync/fs.writeFileSync code; keep runWithEnv and
the existing expectation checks unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: d14cff9c-ed68-4b7d-bd40-44971fc886ad

📥 Commits

Reviewing files that changed from the base of the PR and between 41b383b and 385e921.

📒 Files selected for processing (2)
  • bin/nemoclaw.js
  • test/cli.test.js

@WuKongAI-CMU
Copy link
Contributor Author

Addressing the current reconnect validation review in d0bb41a:

  • reject nemoclaw reconnect with more than one positional sandbox arg and show the intended usage
  • validate an explicit sandbox name against the local registry before trying to reconnect
  • add regression coverage for unknown sandbox names and extra positional args

@coderabbitai review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 27, 2026

@WuKongAI-CMU Thanks for the update! I'll kick off a fresh review of d0bb41a, covering the extra-argument rejection with usage guidance, the registry-backed validation of explicit sandbox names, and the new regression tests for unknown names and extra positional args.

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(cli): add nemoclaw reconnect command for post-reboot recovery

1 participant