Skip to content

feat(scripts): add upgrade-sandbox.sh and non-interactive inference hydration#902

Open
HagegeR wants to merge 1 commit intoNVIDIA:mainfrom
HagegeR:feat/upgrade-sandbox
Open

feat(scripts): add upgrade-sandbox.sh and non-interactive inference hydration#902
HagegeR wants to merge 1 commit intoNVIDIA:mainfrom
HagegeR:feat/upgrade-sandbox

Conversation

@HagegeR
Copy link
Copy Markdown
Contributor

@HagegeR HagegeR commented Mar 25, 2026

Summary

Add a sandbox upgrade workflow that backs up workspace files, optionally snapshots full OpenClaw state, and re-runs nemoclaw onboard — restoring the workspace automatically afterward. Also enables non-interactive onboard to reuse provider/model settings from the sandbox registry, so upgrades skip inference prompts.

Changes

  • scripts/upgrade-sandbox.sh (new): orchestrates backup → optional --full-data snapshot → optional --run-onboard → automatic restore
  • scripts/backup-workspace.sh: split file lists into core/optional (quieter output for missing optional files), add sandbox_download() helper, add portable latest_backup_timestamp()
  • bin/lib/onboard.js: add inferRemoteProviderKeyFromStoredName() and hydrateNonInteractiveInferenceFromRegistry() to fill provider/model env vars from registry during non-interactive onboard; use getCredential() for NVIDIA_API_KEY fallback
  • docs/workspace/backup-restore.md: document the upgrade workflow

Type of Change

  • New feature (non-breaking change that adds functionality)

Testing

  • npx prek run --all-files passes
  • Manual testing: ./scripts/upgrade-sandbox.sh my-assistant backs up and prints instructions
  • Manual testing: ./scripts/upgrade-sandbox.sh --run-onboard --yes my-assistant runs full cycle

Checklist

General

  • I have read the CONTRIBUTING doc.
  • I have signed my commits (git log --show-signature).

Code Changes

  • Formatters applied — npx prek run --all-files auto-fixes formatting.
  • No secrets, API keys, or credentials committed.
  • Doc pages updated for user-facing behavior changes.

Summary by CodeRabbit

  • New Features

    • Added a CLI helper to safely upgrade sandboxes with backup, optional full-data snapshot, automated onboarding, and conditional workspace restore
    • Onboarding (non-interactive) now auto-populates provider and model from stored sandbox metadata
  • Bug Fixes / Improvements

    • Backup/restore now distinguishes core vs optional items and skips missing optional artifacts without error
    • Improved non-interactive credential validation for NVIDIA endpoints to fail early if missing
  • Documentation

    • Backup/restore guide updated with the new upgrade workflow and command options

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 25, 2026

📝 Walkthrough

Walkthrough

Populate non-interactive onboard provider/model from the sandbox registry, tighten NVIDIA credential resolution, stop persisting inference sandbox metadata, distinguish core vs optional workspace artifacts in backup/restore (skip missing optional items), and add a new upgrade-sandbox.sh to backup, optionally snapshot data, run onboard, and restore.

Changes

Cohort / File(s) Summary
Onboard initialization
bin/lib/onboard.js
Added inferRemoteProviderKeyFromStoredName() and hydrateNonInteractiveInferenceFromRegistry() to prefill process.env.NEMOCLAW_PROVIDER/process.env.NEMOCLAW_MODEL in non-interactive runs; require getCredential("NVIDIA_API_KEY") for NVIDIA endpoints and write it to process.env; removed sandbox-registry persistence in setupInference.
Backup/restore script
scripts/backup-workspace.sh
Split workspace items into core vs optional lists; added sandbox_download() to suppress errors for missing optional items; added latest_backup_timestamp() and updated backup/restore flows to skip missing optional files/dirs rather than erroring.
Upgrade workflow script
scripts/upgrade-sandbox.sh
New CLI that wraps backup-workspace.sh to perform a safe upgrade flow: backup, optional full /sandbox/.openclaw-data/ snapshot, optional nemoclaw onboard (non-interactive, with env exports), and optional restore. Flags: --full-data, --run-onboard, --yes, --no-restore.
Docs
docs/workspace/backup-restore.md
Added "Upgrade workflow (image rebuild / onboard)" section describing the upgrade helper script, optional snapshot/restore behavior, example flows, and handling of missing optional artifacts.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Upgrade as "upgrade-sandbox.sh"
    participant Backup as "backup-workspace.sh"
    participant Openshell
    participant Nemoclaw as "nemoclaw CLI"
    participant Registry

    User->>Upgrade: run upgrade-sandbox.sh [--full-data] [--run-onboard] <sandbox>
    Upgrade->>Backup: backup <sandbox>
    Backup->>Openshell: download workspace files
    Openshell-->>Backup: files + timestamp
    Backup-->>Upgrade: backup complete (TS)

    alt --full-data
        Upgrade->>Openshell: download /sandbox/.openclaw-data/
        Openshell-->>Upgrade: data snapshot stored
    end

    alt --run-onboard
        Upgrade->>Nemoclaw: nemoclaw onboard (non-interactive)
        Nemoclaw->>Registry: hydrateNonInteractiveInferenceFromRegistry()
        Registry-->>Nemoclaw: provider & model (if available)
        Nemoclaw->>Openshell: rebuild sandbox image
        Openshell-->>Nemoclaw: rebuild complete
        Nemoclaw-->>Upgrade: onboard complete

        alt restore
            Upgrade->>Backup: restore <sandbox> <TS>
            Backup->>Openshell: upload backed-up files
            Openshell-->>Backup: restore complete
            Backup-->>Upgrade: restore done
        end
    else
        Upgrade-->>User: print next-step restore command
    end

    Upgrade-->>User: upgrade complete
Loading

Estimated Code Review Effort

🎯 4 (Complex) | ⏱️ ~40 minutes

Poem

🐇 I hopped through backups, tidy and quick,
Prefilled the sandbox with a clever trick,
Optional crumbs I gently skipped,
Rebuilt the burrow while configs zipped,
Hooray — a safe upgrade, neat and slick! 🎉

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 43.75% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title accurately captures the two main features added: a new upgrade-sandbox.sh script and non-interactive inference hydration for the onboard process. It is concise, specific, and directly reflects the primary changes across multiple files in the changeset.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
bin/lib/onboard.js (1)

1686-1705: ⚠️ Potential issue | 🟠 Major

Reuse saved credentials for all non-interactive remote providers.

This branch now falls back to getCredential("NVIDIA_API_KEY"), but the non-interactive path for openai, anthropic, gemini, and the compatible endpoints still aborts unless the key is already exported in the environment. Users who previously onboarded interactively will usually have those keys only in ~/.nemoclaw/credentials.json, so scripts/upgrade-sandbox.sh --run-onboard still fails for those providers.

💡 Suggested fix
       } else {
         if (isNonInteractive()) {
-          if (!process.env[credentialEnv]) {
+          const key = getCredential(credentialEnv);
+          if (!key) {
             console.error(`  ${credentialEnv} is required for ${remoteConfig.label} in non-interactive mode.`);
             process.exit(1);
           }
+          process.env[credentialEnv] = key;
         } else {
           await ensureNamedCredential(credentialEnv, remoteConfig.label + " API key", remoteConfig.helpUrl);
         }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bin/lib/onboard.js` around lines 1686 - 1705, The non-interactive branch for
other remote providers currently aborts if process.env[credentialEnv] is missing
instead of reusing a saved key from getCredential("..."); update the
non-interactive check in the else branch (the block using isNonInteractive(),
credentialEnv, and remoteConfig.label) to first call
getCredential(credentialEnv) and if that returns a key set
process.env[credentialEnv] = key and continue; only call process.exit(1) if
neither the environment nor getCredential provides a key; keep the interactive
path using ensureNamedCredential unchanged.
🧹 Nitpick comments (2)
docs/workspace/backup-restore.md (2)

101-101: Split this into one sentence per source line.

Line 101 packs two sentences onto one line.

As per coding guidelines, "One sentence per line in source (makes diffs readable). Flag paragraphs where multiple sentences appear on the same line."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/workspace/backup-restore.md` at line 101, Split the combined sentence
into two separate source lines so each sentence is on its own line: put "Before
you change the sandbox image (for example after editing `Dockerfile`) or run
`nemoclaw onboard`, back up." on its own line and place "Optional files such as
`MEMORY.md` or `memory/` may not exist yet; the backup script skips them with a
warning." on the following line to satisfy the one-sentence-per-line guideline.

112-112: Drop the routine bolding and break the sentences apart.

**automatically** is not a UI label or warning, and this line currently contains multiple sentences. LLM pattern detected.

As per coding guidelines, "Bold is reserved for UI labels, parameter names, and genuine warnings." and "One sentence per line in source (makes diffs readable). Flag paragraphs where multiple sentences appear on the same line."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/workspace/backup-restore.md` at line 112, Remove the bold formatting
around "automatically" and split the existing single-line paragraph into
separate sentences each on its own line: describe that `--run-onboard` runs
`nemoclaw onboard` from the repo root (or `NEMOCLAW_REPO_ROOT`), state that
after a successful onboard the script runs `backup-workspace.sh restore` for the
backup timestamp created at the start (remove bolding around "automatically"),
and add a separate sentence that `--no-restore` skips that restore so you get a
fresh workspace after the rebuild; ensure each sentence is on its own line.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@bin/lib/onboard.js`:
- Around line 2246-2247: The registry entry for new sandboxes only stores {
model, provider } so hydrateNonInteractiveInferenceFromRegistry() cannot restore
NEMOCLAW_ENDPOINT_URL required by setupNim() for
compatible-endpoint/compatible-anthropic-endpoint; update the logic after
createSandbox in onboard.js (where createSandbox is called and
registry.updateSandbox is invoked) to also persist the sandbox's endpoint URL
(e.g., a field like nemoclawEndpoint or endpointUrl) and then ensure
hydrateNonInteractiveInferenceFromRegistry() sets
process.env.NEMOCLAW_ENDPOINT_URL from that stored field so setupNim() can run
non-interactively with compatible endpoints.

In `@scripts/backup-workspace.sh`:
- Around line 74-95: The loop currently treats any non-zero exit from
sandbox_download as a benign skip; change logic so core items (FILES_CORE loop
using sandbox_download ... 0) treat any non-zero as fatal: log the
sandbox_download stderr (use warn or processLogger equivalent) and exit non-zero
immediately (do not increment count) so partial backups fail; for optional items
(FILES_OPTIONAL and DIRS_OPTIONAL using sandbox_download ... 1) call
sandbox_download and inspect its exit code/output: if it explicitly indicates
"not found" keep the current quiet skip message and do not increment count, but
if it fails for any other reason surface a clear error (echo/warn with stderr),
mark the backup as failed (set a failure flag or exit non-zero at end) and do
not increment count; update references to sandbox_download, FILES_CORE,
FILES_OPTIONAL, DIRS_OPTIONAL, count, warn, and the optional message branches
accordingly so only true "not found" remains quiet and all other errors are
propagated.

In `@scripts/upgrade-sandbox.sh`:
- Around line 94-100: The current TS lookup is a TOCTOU race (using ls on
BACKUP_BASE) — instead have the backup command itself emit the created
timestamp/path and consume that output instead of rescanning; modify the code
that runs backup-workspace.sh backup to capture its stdout (or have
backup-workspace.sh write the created backup path to a known file) and assign
that value to TS, then validate [ -n "$TS" ] || fail and set
DEST="${BACKUP_BASE}/${TS}" as before; reference the TS variable and the
backup-workspace.sh invocation so you replace the ls-based discovery with the
backup command’s explicit output.

---

Outside diff comments:
In `@bin/lib/onboard.js`:
- Around line 1686-1705: The non-interactive branch for other remote providers
currently aborts if process.env[credentialEnv] is missing instead of reusing a
saved key from getCredential("..."); update the non-interactive check in the
else branch (the block using isNonInteractive(), credentialEnv, and
remoteConfig.label) to first call getCredential(credentialEnv) and if that
returns a key set process.env[credentialEnv] = key and continue; only call
process.exit(1) if neither the environment nor getCredential provides a key;
keep the interactive path using ensureNamedCredential unchanged.

---

Nitpick comments:
In `@docs/workspace/backup-restore.md`:
- Line 101: Split the combined sentence into two separate source lines so each
sentence is on its own line: put "Before you change the sandbox image (for
example after editing `Dockerfile`) or run `nemoclaw onboard`, back up." on its
own line and place "Optional files such as `MEMORY.md` or `memory/` may not
exist yet; the backup script skips them with a warning." on the following line
to satisfy the one-sentence-per-line guideline.
- Line 112: Remove the bold formatting around "automatically" and split the
existing single-line paragraph into separate sentences each on its own line:
describe that `--run-onboard` runs `nemoclaw onboard` from the repo root (or
`NEMOCLAW_REPO_ROOT`), state that after a successful onboard the script runs
`backup-workspace.sh restore` for the backup timestamp created at the start
(remove bolding around "automatically"), and add a separate sentence that
`--no-restore` skips that restore so you get a fresh workspace after the
rebuild; ensure each sentence is on its own line.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 4025feaf-c8d0-4a4b-9040-562d2f2e53a3

📥 Commits

Reviewing files that changed from the base of the PR and between 0ef5dd2 and 11f856d.

📒 Files selected for processing (4)
  • bin/lib/onboard.js
  • docs/workspace/backup-restore.md
  • scripts/backup-workspace.sh
  • scripts/upgrade-sandbox.sh

Comment on lines +2246 to +2247
const sandboxName = await createSandbox(gpu, model, provider, preferredInferenceApi);
registry.updateSandbox(sandboxName, { model, provider });
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Persist compatible endpoint URLs with the sandbox metadata.

hydrateNonInteractiveInferenceFromRegistry() can only reconstruct what gets stored here. Right now this writes { model, provider }, but setupNim() still requires NEMOCLAW_ENDPOINT_URL for compatible-endpoint and compatible-anthropic-endpoint, so non-interactive upgrades of those sandboxes still exit even though provider/model hydration succeeds.

💡 Suggested fix
-  registry.updateSandbox(sandboxName, { model, provider });
+  registry.updateSandbox(sandboxName, {
+    model,
+    provider,
+    endpointUrl:
+      provider === "compatible-endpoint" || provider === "compatible-anthropic-endpoint"
+        ? endpointUrl
+        : undefined,
+  });

You’ll also need to hydrate process.env.NEMOCLAW_ENDPOINT_URL from that stored field.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bin/lib/onboard.js` around lines 2246 - 2247, The registry entry for new
sandboxes only stores { model, provider } so
hydrateNonInteractiveInferenceFromRegistry() cannot restore
NEMOCLAW_ENDPOINT_URL required by setupNim() for
compatible-endpoint/compatible-anthropic-endpoint; update the logic after
createSandbox in onboard.js (where createSandbox is called and
registry.updateSandbox is invoked) to also persist the sandbox's endpoint URL
(e.g., a field like nemoclawEndpoint or endpointUrl) and then ensure
hydrateNonInteractiveInferenceFromRegistry() sets
process.env.NEMOCLAW_ENDPOINT_URL from that stored field so setupNim() can run
non-interactively with compatible endpoints.

Comment on lines +74 to 95
for f in "${FILES_CORE[@]}"; do
if sandbox_download "$sandbox" "${WORKSPACE_PATH}/${f}" "${dest}/" 0; then
count=$((count + 1))
else
warn "Skipped ${f} (not found or download failed)"
fi
done

for d in "${DIRS[@]}"; do
if openshell sandbox download "$sandbox" "${WORKSPACE_PATH}/${d}/" "${dest}/${d}/"; then
for f in "${FILES_OPTIONAL[@]}"; do
if sandbox_download "$sandbox" "${WORKSPACE_PATH}/${f}" "${dest}/" 1; then
count=$((count + 1))
else
echo -e "${DIM}[backup]${NC} Optional ${f} not in sandbox — skipped (normal until created)."
fi
done

for d in "${DIRS_OPTIONAL[@]}"; do
if sandbox_download "$sandbox" "${WORKSPACE_PATH}/${d}/" "${dest}/${d}/" 1; then
count=$((count + 1))
else
warn "Skipped ${d}/ (not found or download failed)"
echo -e "${DIM}[backup]${NC} Optional ${d}/ not in sandbox — skipped (normal until created)."
fi
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't report a partial backup as success.

Every non-zero openshell sandbox download is currently treated as a skip. That means a transient gateway/download failure can leave you with a partial backup that still looks successful, which is risky before destroy or upgrade-sandbox.sh. Core artifacts should fail the run, and optional artifacts should only be downgraded when the remote path is actually absent.

💡 Minimum hardening for the core files
   for f in "${FILES_CORE[@]}"; do
     if sandbox_download "$sandbox" "${WORKSPACE_PATH}/${f}" "${dest}/" 0; then
       count=$((count + 1))
     else
-      warn "Skipped ${f} (not found or download failed)"
+      fail "Required workspace file ${f} was not backed up."
     fi
   done

For the optional paths, keep the quiet UX only for the specific "not found yet" case and surface other download errors.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/backup-workspace.sh` around lines 74 - 95, The loop currently treats
any non-zero exit from sandbox_download as a benign skip; change logic so core
items (FILES_CORE loop using sandbox_download ... 0) treat any non-zero as
fatal: log the sandbox_download stderr (use warn or processLogger equivalent)
and exit non-zero immediately (do not increment count) so partial backups fail;
for optional items (FILES_OPTIONAL and DIRS_OPTIONAL using sandbox_download ...
1) call sandbox_download and inspect its exit code/output: if it explicitly
indicates "not found" keep the current quiet skip message and do not increment
count, but if it fails for any other reason surface a clear error (echo/warn
with stderr), mark the backup as failed (set a failure flag or exit non-zero at
end) and do not increment count; update references to sandbox_download,
FILES_CORE, FILES_OPTIONAL, DIRS_OPTIONAL, count, warn, and the optional message
branches accordingly so only true "not found" remains quiet and all other errors
are propagated.

Comment on lines +94 to +100
TS=""
if [ -d "$BACKUP_BASE" ]; then
# shellcheck disable=SC2012 # backup dirs are YYYYMMDD-HHMMSS — no special chars
TS="$(ls -1t "$BACKUP_BASE" 2>/dev/null | head -n1 || true)"
fi
[ -n "$TS" ] || fail "No backup directory found under ${BACKUP_BASE}/"
DEST="${BACKUP_BASE}/${TS}"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid the TOCTOU lookup for the backup timestamp.

This rescans ~/.nemoclaw/backups after the backup command finishes. If another backup lands between Line 92 and Line 97, TS can point at the wrong directory and the post-onboard restore uploads a different sandbox's workspace into this one. Please capture the timestamp/path from backup-workspace.sh backup itself instead of asking "what is latest now?".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/upgrade-sandbox.sh` around lines 94 - 100, The current TS lookup is a
TOCTOU race (using ls on BACKUP_BASE) — instead have the backup command itself
emit the created timestamp/path and consume that output instead of rescanning;
modify the code that runs backup-workspace.sh backup to capture its stdout (or
have backup-workspace.sh write the created backup path to a known file) and
assign that value to TS, then validate [ -n "$TS" ] || fail and set
DEST="${BACKUP_BASE}/${TS}" as before; reference the TS variable and the
backup-workspace.sh invocation so you replace the ls-based discovery with the
backup command’s explicit output.

@HagegeR HagegeR force-pushed the feat/upgrade-sandbox branch from 11f856d to 79dd891 Compare March 25, 2026 18:52
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
bin/lib/onboard.js (1)

1229-1239: ⚠️ Potential issue | 🟠 Major

Endpoint URL not hydrated for compatible-endpoint providers.

Even after fixing the model/provider persistence issue, hydrateNonInteractiveInferenceFromRegistry() doesn't restore NEMOCLAW_ENDPOINT_URL. Non-interactive upgrades of sandboxes using compatible-endpoint or compatible-anthropic-endpoint will still fail at lines 1667-1681 where the endpoint URL is required.

The registry entry would need to persist endpointUrl alongside model/provider, and hydration would need to set process.env.NEMOCLAW_ENDPOINT_URL from that stored field.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bin/lib/onboard.js` around lines 1229 - 1239,
hydrateNonInteractiveInferenceFromRegistry currently infers and sets
NEMOCLAW_PROVIDER from the registry entry (using
inferRemoteProviderKeyFromStoredName and entry.provider) but does not restore
endpoint information for compatible-endpoint providers; update the registry
handling to read the stored endpoint field (e.g., entry.endpointUrl) and, when a
compatible-endpoint or compatible-anthropic-endpoint provider is inferred, set
process.env.NEMOCLAW_ENDPOINT_URL to that value and log it similarly to the
provider note; also ensure the registry persistence logic saves endpointUrl
alongside model/provider so hydrateNonInteractiveInferenceFromRegistry can read
it.
scripts/upgrade-sandbox.sh (1)

94-101: ⚠️ Potential issue | 🟡 Minor

TOCTOU race when discovering backup timestamp.

This rescans ~/.nemoclaw/backups after backup-workspace.sh backup finishes. If another backup completes between line 92 and line 97, TS could point to the wrong directory.

Consider having backup-workspace.sh backup emit the created timestamp to stdout, then capture it:

-bash "$BACKUP_WORKSPACE" backup "$SANDBOX"
-
-TS=""
-if [ -d "$BACKUP_BASE" ]; then
-  # shellcheck disable=SC2012  # backup dirs are YYYYMMDD-HHMMSS — no special chars
-  TS="$(ls -1t "$BACKUP_BASE" 2>/dev/null | head -n1 || true)"
-fi
+TS="$(bash "$BACKUP_WORKSPACE" backup "$SANDBOX" | tail -n1)"
+# Assuming backup-workspace.sh outputs the timestamp on the last line

Alternatively, source backup-workspace.sh and call latest_backup_timestamp() to avoid duplicating the logic.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/upgrade-sandbox.sh` around lines 94 - 101, The code currently
discovers the latest backup by re-listing BACKUP_BASE into TS which creates a
TOCTOU race; instead have the producer script emit the exact timestamp and
consume that instead (e.g. run backup-workspace.sh backup and capture its
timestamp stdout into TS) or source backup-workspace.sh and call its
latest_backup_timestamp() function to get a deterministic timestamp; update the
block that sets TS and DEST to use the captured/returned timestamp (referencing
TS, BACKUP_BASE, DEST and the backup-workspace.sh backup invocation or
latest_backup_timestamp()) and remove the ls-based discovery.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@bin/lib/onboard.js`:
- Around line 1217-1245: hydrateNonInteractiveInferenceFromRegistry fails
because registerSandbox never persisted model/provider and
setupInference/createSandbox never call registry.updateSandbox, so
entry.model/provider remain null; fix by persisting the inference config after
sandbox creation or inside setupInference: call
registry.updateSandbox(sandboxName, { model: model /*or modelName var*/,
provider: providerKey /*or provider var*/ }) (and include nimContainer if
applicable) so hydrateNonInteractiveInferenceFromRegistry can read entry.model
and entry.provider; update the call sites (createSandbox/ setupInference) to
pass the correct sandboxName and values.

---

Duplicate comments:
In `@bin/lib/onboard.js`:
- Around line 1229-1239: hydrateNonInteractiveInferenceFromRegistry currently
infers and sets NEMOCLAW_PROVIDER from the registry entry (using
inferRemoteProviderKeyFromStoredName and entry.provider) but does not restore
endpoint information for compatible-endpoint providers; update the registry
handling to read the stored endpoint field (e.g., entry.endpointUrl) and, when a
compatible-endpoint or compatible-anthropic-endpoint provider is inferred, set
process.env.NEMOCLAW_ENDPOINT_URL to that value and log it similarly to the
provider note; also ensure the registry persistence logic saves endpointUrl
alongside model/provider so hydrateNonInteractiveInferenceFromRegistry can read
it.

In `@scripts/upgrade-sandbox.sh`:
- Around line 94-101: The code currently discovers the latest backup by
re-listing BACKUP_BASE into TS which creates a TOCTOU race; instead have the
producer script emit the exact timestamp and consume that instead (e.g. run
backup-workspace.sh backup and capture its timestamp stdout into TS) or source
backup-workspace.sh and call its latest_backup_timestamp() function to get a
deterministic timestamp; update the block that sets TS and DEST to use the
captured/returned timestamp (referencing TS, BACKUP_BASE, DEST and the
backup-workspace.sh backup invocation or latest_backup_timestamp()) and remove
the ls-based discovery.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 74526e07-c1c6-48fb-ac7e-ebd521ecbb1f

📥 Commits

Reviewing files that changed from the base of the PR and between 11f856d and 79dd891.

📒 Files selected for processing (4)
  • bin/lib/onboard.js
  • docs/workspace/backup-restore.md
  • scripts/backup-workspace.sh
  • scripts/upgrade-sandbox.sh
✅ Files skipped from review due to trivial changes (1)
  • docs/workspace/backup-restore.md
🚧 Files skipped from review as they are similar to previous changes (1)
  • scripts/backup-workspace.sh

Comment on lines +1217 to +1245
function hydrateNonInteractiveInferenceFromRegistry() {
if (!isNonInteractive()) return;
const name = (process.env.NEMOCLAW_SANDBOX_NAME || "").trim();
if (!name) return;
let entry = registry.getSandbox(name);
if (!entry || !entry.model) {
entry = registry.getSandbox(GATEWAY_NAME);
}
if (!entry || !entry.model) return;

const providerKey = (process.env.NEMOCLAW_PROVIDER || "").trim();
const modelEnv = (process.env.NEMOCLAW_MODEL || "").trim();
if (!providerKey && entry.provider) {
let inferred = inferRemoteProviderKeyFromStoredName(entry.provider);
if (entry.provider === "vllm-local" && entry.nimContainer) {
inferred = "nim-local";
} else if (entry.provider === "vllm-local" && !entry.nimContainer) {
inferred = "vllm";
}
if (inferred) {
process.env.NEMOCLAW_PROVIDER = inferred;
note(` [non-interactive] NEMOCLAW_PROVIDER (from registry) → ${inferred}`);
}
}
if (!modelEnv && entry.model) {
process.env.NEMOCLAW_MODEL = entry.model;
note(` [non-interactive] NEMOCLAW_MODEL (from registry) → ${entry.model}`);
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify that model/provider are not being written to registry anywhere

echo "=== Searching for registry.updateSandbox calls ==="
rg -n "registry\.updateSandbox" --type=js

echo ""
echo "=== Searching for registerSandbox calls to see what's passed ==="
rg -n -A5 "registry\.registerSandbox" --type=js

echo ""
echo "=== Check if model/provider appear in any updateSandbox call ==="
rg -n "updateSandbox.*model|updateSandbox.*provider" --type=js

Repository: NVIDIA/NemoClaw

Length of output: 5357


🏁 Script executed:

#!/bin/bash
# 1. Check the exact line 1225 to see what the early return condition is
echo "=== Lines around 1225 in onboard.js to see the exit condition ==="
sed -n '1220,1230p' bin/lib/onboard.js

echo ""
echo "=== Check the full hydrateNonInteractiveInferenceFromRegistry function ==="
sed -n '1217,1245p' bin/lib/onboard.js

echo ""
echo "=== Search for setupInference function ==="
rg -n -A20 "function setupInference|const setupInference" --type=js bin/lib/onboard.js

echo ""
echo "=== Check GATEWAY_NAME and see if it has model/provider ==="
rg -n "GATEWAY_NAME" --type=js bin/lib/onboard.js | head -20

echo ""
echo "=== Check where GATEWAY_NAME is registered/defined ==="
rg -n "registerSandbox.*GATEWAY_NAME|GATEWAY_NAME.*registerSandbox" --type=js

Repository: NVIDIA/NemoClaw

Length of output: 4398


🏁 Script executed:

#!/bin/bash
# Check if GATEWAY_NAME is ever registered with model/provider
echo "=== Search for registerSandbox with GATEWAY_NAME ==="
rg -n "registerSandbox.*nemoclaw|registerSandbox.*GATEWAY" --type=js

echo ""
echo "=== Full setupInference function to see if it calls updateSandbox ==="
sed -n '1975,2050p' bin/lib/onboard.js

echo ""
echo "=== Search for any updateSandbox calls after setupInference ==="
rg -n -B5 -A5 "updateSandbox.*model|updateSandbox.*provider" --type=js bin/lib/onboard.js

Repository: NVIDIA/NemoClaw

Length of output: 3655


🏁 Script executed:

#!/bin/bash
# Check if hydrateNonInteractiveInferenceFromRegistry is actually called
echo "=== Search for calls to hydrateNonInteractiveInferenceFromRegistry ==="
rg -n "hydrateNonInteractiveInferenceFromRegistry" --type=js

echo ""
echo "=== Check the context where it's called ==="
rg -n -B10 -A3 "hydrateNonInteractiveInferenceFromRegistry()" --type=js bin/lib/onboard.js

echo ""
echo "=== Check if model/provider are ever stored in the registry at all ==="
rg -n "\.model\s*=|\.provider\s*=" --type=js bin/lib/registry.js | head -20

echo ""
echo "=== Check registry.js to see the getSandbox implementation ==="
sed -n '1,150p' bin/lib/registry.js

Repository: NVIDIA/NemoClaw

Length of output: 4050


🏁 Script executed:

#!/bin/bash
# Final verification: check if GATEWAY_NAME entry exists and has model/provider
echo "=== Check if there's any special initialization for GATEWAY_NAME in registry ==="
rg -n "nemoclaw|GATEWAY_NAME" --type=js bin/lib/registry.js

echo ""
echo "=== Check the setupNim flow after hydrateNonInteractiveInferenceFromRegistry ==="
sed -n '1570,1620p' bin/lib/onboard.js

echo ""
echo "=== Verify: does setupNim ever call updateSandbox after setupInference? ==="
sed -n '1570,1750p' bin/lib/onboard.js | grep -n "updateSandbox\|setupInference"

Repository: NVIDIA/NemoClaw

Length of output: 2591


🏁 Script executed:

#!/bin/bash
# Check the complete setupNim and related flow to confirm model/provider are never persisted
echo "=== Search for ALL updateSandbox calls in onboard.js ==="
rg -n "updateSandbox" bin/lib/onboard.js

echo ""
echo "=== Search for registerSandbox calls to confirm parameters ==="
rg -n -A4 "registry\.registerSandbox" bin/lib/onboard.js

echo ""
echo "=== Check if model/provider are set on entry object anywhere ==="
rg -n "entry\.model\s*=|entry\.provider\s*=" bin/lib/onboard.js

Repository: NVIDIA/NemoClaw

Length of output: 564


Model and provider are never persisted to the registry, so hydration will always fail.

This function tries to read entry.model and entry.provider from the registry, but these values are never written:

  1. registerSandbox() (lines 1559-1562) only passes { name, gpuEnabled }, leaving model and provider as null
  2. The only updateSandbox() call in onboard.js (line 2248) updates { nimContainer } only; setupInference() never persists model/provider to the registry

The function will always exit early at line 1225 because entry.model will be null. The fallback to GATEWAY_NAME won't help since that's never registered with inference configuration either.

To make hydration work, add a registry.updateSandbox() call in setupInference or immediately after createSandbox:

// After sandbox creation or in setupInference:
registry.updateSandbox(sandboxName, { model, provider });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bin/lib/onboard.js` around lines 1217 - 1245,
hydrateNonInteractiveInferenceFromRegistry fails because registerSandbox never
persisted model/provider and setupInference/createSandbox never call
registry.updateSandbox, so entry.model/provider remain null; fix by persisting
the inference config after sandbox creation or inside setupInference: call
registry.updateSandbox(sandboxName, { model: model /*or modelName var*/,
provider: providerKey /*or provider var*/ }) (and include nimContainer if
applicable) so hydrateNonInteractiveInferenceFromRegistry can read entry.model
and entry.provider; update the call sites (createSandbox/ setupInference) to
pass the correct sandboxName and values.

@HagegeR HagegeR force-pushed the feat/upgrade-sandbox branch from 79dd891 to f6c060d Compare March 26, 2026 09:37
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (4)
scripts/backup-workspace.sh (1)

45-55: ⚠️ Potential issue | 🟠 Major

Don't downgrade real backup failures to "skipped".

Because sandbox_download() only returns success/failure, Lines 75-95 still treat every non-zero openshell sandbox download as a benign skip. That means a transient failure on a core file can still yield a "successful" backup, while optional paths mislabel real gateway/download errors as "not in sandbox." FILES_CORE should fail the run, and the quiet path should only apply when the remote path is definitively missing.

Also applies to: 74-95

scripts/upgrade-sandbox.sh (1)

91-100: ⚠️ Potential issue | 🟠 Major

Capture the created backup timestamp instead of rescanning ~/.nemoclaw/backups.

Line 92 completes the backup, then Lines 95-99 ask “what is latest now?”. If another backup lands in between, TS can point at the wrong directory and the later restore uploads a different workspace into this sandbox. Have backup-workspace.sh backup emit the timestamp/path it just created and consume that directly here.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/upgrade-sandbox.sh` around lines 91 - 100, The current flow runs bash
"$BACKUP_WORKSPACE" backup "$SANDBOX" then rescans BACKUP_BASE to set TS/DEST
which can pick up a race; modify the script so the backup command
(BACKUP_WORKSPACE) emits the exact timestamp or path it created (e.g., print the
created directory on stdout), capture that output into TS (replace the ls/rescan
logic) and validate it before setting DEST, and keep using TS/DEST downstream;
update variables and error handling around the call to BACKUP_WORKSPACE, TS, and
DEST to rely on the captured value instead of rescanning.
bin/lib/onboard.js (2)

1239-1247: ⚠️ Potential issue | 🟠 Major

Persist model and provider before relying on registry hydration.

Line 1244 assumes the sandbox entry already carries inference metadata, but sandboxes created by the current flow are still registered with only { name, gpuEnabled }, and bin/lib/registry.js:41-56 initializes model/provider to null. For those entries, this helper falls through and non-interactive upgrades revert to the default build/model instead of the previous selection.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bin/lib/onboard.js` around lines 1239 - 1247,
hydrateNonInteractiveInferenceFromRegistry fails when registry entries are
created without inference metadata because it assumes entry.model/provider are
already set; update the sandbox registration flow to persist chosen inference
metadata (model and provider) into the registry entry at creation time (the code
path that calls registry.getSandbox / registry.add or equivalent), and ensure
hydrateNonInteractiveInferenceFromRegistry reads and uses entry.model and
entry.provider when present (or sets them on the registry entry) instead of
falling back to defaults; reference the
hydrateNonInteractiveInferenceFromRegistry helper and the
registry.getSandbox/registry.add (or the sandbox registration function) to
locate the changes.

1251-1265: ⚠️ Potential issue | 🟠 Major

Hydrate compatible endpoint URLs too.

This restores NEMOCLAW_PROVIDER and NEMOCLAW_MODEL, but setupNim() still requires NEMOCLAW_ENDPOINT_URL for compatible-endpoint and compatible-anthropic-endpoint. Non-interactive upgrades of those sandboxes will still fail here unless the endpoint URL is persisted and rehydrated alongside the provider metadata.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bin/lib/onboard.js` around lines 1251 - 1265, The current non-interactive
rehydration sets NEMOCLAW_PROVIDER and NEMOCLAW_MODEL from entry but misses
restoring endpoint URLs, causing setupNim() to fail for providers like
"compatible-endpoint" and "compatible-anthropic-endpoint"; update the hydration
logic (near inferRemoteProviderKeyFromStoredName and the block that sets
NEMOCLAW_MODEL) to also check entry for persisted endpoint fields (e.g.,
entry.endpointUrl, entry.endpoint_url, or whichever key your registry uses) and,
when present, set process.env.NEMOCLAW_ENDPOINT_URL (and any provider-specific
variants) and emit a note() similar to the other rehydrated vars so setupNim()
can run non-interactively.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@bin/lib/onboard.js`:
- Around line 1761-1767: The non-interactive onboarding path only falls back to
~/.nemoclaw/credentials.json for NVIDIA via getCredential("NVIDIA_API_KEY");
mirror that behavior for other remote providers by attempting to read
getCredential(credentialEnv) before exiting when !process.env[credentialEnv]; if
a key is returned, set process.env[credentialEnv] = key so the later checks
succeed. Update the code paths that currently do if
(!process.env[credentialEnv]) { ... process.exit(1) } to first call
getCredential(credentialEnv) and assign it into process.env when present,
referencing the getCredential function and the credentialEnv variable used in
the surrounding logic.

In `@scripts/backup-workspace.sh`:
- Around line 132-139: The loop handling FILES_CORE currently logs a warning on
an openshell sandbox upload failure and continues; change this to fail fast by
exiting with a non-zero status so the restore aborts on any core file upload
error. Specifically, inside the for loop that iterates FILES_CORE and calls
openshell sandbox upload "$sandbox" "${src}/${f}" "${WORKSPACE_PATH}/", replace
the warn "Failed to restore ${f}" behavior with an immediate exit (e.g., call
exit 1 or a dedicated fatal/error function) so count isn't incremented and
upgrade-sandbox.sh cannot complete with partial core restores.

In `@scripts/upgrade-sandbox.sh`:
- Around line 103-109: When FULL_DATA is requested, the script must not proceed
if the openshell sandbox download of /sandbox/.openclaw-data/ into
"${DEST}/openclaw-data/" fails; change the logic around the openshell sandbox
download (the block using FULL_DATA, mkdir -p "${DEST}/openclaw-data", and the
openshell command) so you only create the destination directory after a
successful download or save to a temporary location and move it on success, and
on download failure exit non‑zero (do not just warn) to stop subsequent steps
like the "nemoclaw onboard" sequence; also ensure the later restore hint
printing (around the existing restore-hint lines) is suppressed when the
download failed so you don't show misleading restore instructions for an empty
or absent snapshot.

---

Duplicate comments:
In `@bin/lib/onboard.js`:
- Around line 1239-1247: hydrateNonInteractiveInferenceFromRegistry fails when
registry entries are created without inference metadata because it assumes
entry.model/provider are already set; update the sandbox registration flow to
persist chosen inference metadata (model and provider) into the registry entry
at creation time (the code path that calls registry.getSandbox / registry.add or
equivalent), and ensure hydrateNonInteractiveInferenceFromRegistry reads and
uses entry.model and entry.provider when present (or sets them on the registry
entry) instead of falling back to defaults; reference the
hydrateNonInteractiveInferenceFromRegistry helper and the
registry.getSandbox/registry.add (or the sandbox registration function) to
locate the changes.
- Around line 1251-1265: The current non-interactive rehydration sets
NEMOCLAW_PROVIDER and NEMOCLAW_MODEL from entry but misses restoring endpoint
URLs, causing setupNim() to fail for providers like "compatible-endpoint" and
"compatible-anthropic-endpoint"; update the hydration logic (near
inferRemoteProviderKeyFromStoredName and the block that sets NEMOCLAW_MODEL) to
also check entry for persisted endpoint fields (e.g., entry.endpointUrl,
entry.endpoint_url, or whichever key your registry uses) and, when present, set
process.env.NEMOCLAW_ENDPOINT_URL (and any provider-specific variants) and emit
a note() similar to the other rehydrated vars so setupNim() can run
non-interactively.

In `@scripts/upgrade-sandbox.sh`:
- Around line 91-100: The current flow runs bash "$BACKUP_WORKSPACE" backup
"$SANDBOX" then rescans BACKUP_BASE to set TS/DEST which can pick up a race;
modify the script so the backup command (BACKUP_WORKSPACE) emits the exact
timestamp or path it created (e.g., print the created directory on stdout),
capture that output into TS (replace the ls/rescan logic) and validate it before
setting DEST, and keep using TS/DEST downstream; update variables and error
handling around the call to BACKUP_WORKSPACE, TS, and DEST to rely on the
captured value instead of rescanning.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: c11f4466-0a2f-45ae-bc93-2cbe616e4ce6

📥 Commits

Reviewing files that changed from the base of the PR and between 79dd891 and f6c060d.

📒 Files selected for processing (4)
  • bin/lib/onboard.js
  • docs/workspace/backup-restore.md
  • scripts/backup-workspace.sh
  • scripts/upgrade-sandbox.sh
✅ Files skipped from review due to trivial changes (1)
  • docs/workspace/backup-restore.md

Comment on lines +1761 to +1767
const key = getCredential("NVIDIA_API_KEY");
if (!key) {
console.error(" NVIDIA_API_KEY is required for NVIDIA Endpoints in non-interactive mode.");
console.error(" Set it in the environment or ~/.nemoclaw/credentials.json, or run interactive onboard.");
process.exit(1);
}
process.env.NVIDIA_API_KEY = key;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Mirror the credential fallback for the other remote providers.

This fixes NVIDIA_API_KEY, but the adjacent non-build path still exits on !process.env[credentialEnv]. Saved OPENAI_API_KEY/ANTHROPIC_API_KEY/GEMINI_API_KEY/compatible-endpoint credentials in ~/.nemoclaw/credentials.json are still ignored in non-interactive mode, so those upgrades keep failing even when the credential already exists.

Minimal follow-up
-        if (isNonInteractive()) {
-          if (!process.env[credentialEnv]) {
+        if (isNonInteractive()) {
+          const key = getCredential(credentialEnv);
+          if (!key) {
             console.error(`  ${credentialEnv} is required for ${remoteConfig.label} in non-interactive mode.`);
             process.exit(1);
           }
+          process.env[credentialEnv] = key;
         } else {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bin/lib/onboard.js` around lines 1761 - 1767, The non-interactive onboarding
path only falls back to ~/.nemoclaw/credentials.json for NVIDIA via
getCredential("NVIDIA_API_KEY"); mirror that behavior for other remote providers
by attempting to read getCredential(credentialEnv) before exiting when
!process.env[credentialEnv]; if a key is returned, set
process.env[credentialEnv] = key so the later checks succeed. Update the code
paths that currently do if (!process.env[credentialEnv]) { ... process.exit(1) }
to first call getCredential(credentialEnv) and assign it into process.env when
present, referencing the getCredential function and the credentialEnv variable
used in the surrounding logic.

Comment on lines +132 to +139
for f in "${FILES_CORE[@]}"; do
if [ -f "${src}/${f}" ]; then
if openshell sandbox upload "$sandbox" "${src}/${f}" "${WORKSPACE_PATH}/"; then
count=$((count + 1))
else
warn "Failed to restore ${f}"
fi
fi
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fail fast on core restore errors.

Line 137 only warns and continues, so upgrade-sandbox.sh can finish with an incomplete persona if one core upload flakes. For FILES_CORE, this should abort the restore instead of counting a partial result as success.

Minimal hardening
   for f in "${FILES_CORE[@]}"; do
     if [ -f "${src}/${f}" ]; then
       if openshell sandbox upload "$sandbox" "${src}/${f}" "${WORKSPACE_PATH}/"; then
         count=$((count + 1))
       else
-        warn "Failed to restore ${f}"
+        fail "Failed to restore required workspace file ${f}"
       fi
     fi
   done
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
for f in "${FILES_CORE[@]}"; do
if [ -f "${src}/${f}" ]; then
if openshell sandbox upload "$sandbox" "${src}/${f}" "${WORKSPACE_PATH}/"; then
count=$((count + 1))
else
warn "Failed to restore ${f}"
fi
fi
for f in "${FILES_CORE[@]}"; do
if [ -f "${src}/${f}" ]; then
if openshell sandbox upload "$sandbox" "${src}/${f}" "${WORKSPACE_PATH}/"; then
count=$((count + 1))
else
fail "Failed to restore required workspace file ${f}"
fi
fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/backup-workspace.sh` around lines 132 - 139, The loop handling
FILES_CORE currently logs a warning on an openshell sandbox upload failure and
continues; change this to fail fast by exiting with a non-zero status so the
restore aborts on any core file upload error. Specifically, inside the for loop
that iterates FILES_CORE and calls openshell sandbox upload "$sandbox"
"${src}/${f}" "${WORKSPACE_PATH}/", replace the warn "Failed to restore ${f}"
behavior with an immediate exit (e.g., call exit 1 or a dedicated fatal/error
function) so count isn't incremented and upgrade-sandbox.sh cannot complete with
partial core restores.

Comment on lines +103 to +109
if [ "$FULL_DATA" -eq 1 ]; then
info "Step 2/3: Full OpenClaw snapshot (/sandbox/.openclaw-data/)..."
mkdir -p "${DEST}/openclaw-data"
if openshell sandbox download "$SANDBOX" /sandbox/.openclaw-data/ "${DEST}/openclaw-data/"; then
info "Saved openclaw-data under ${DEST}/openclaw-data/"
else
warn "openclaw-data download failed or was empty; workspace files in ${DEST}/ are still valid."
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't continue a --full-data upgrade after the snapshot fails.

If the user explicitly asked for --full-data, Lines 106-109 should not just warn and keep going into nemoclaw onboard. A transient failure here can destroy agents/sessions/plugins during the recreate step, and Lines 151-152 can still print a restore hint because the empty destination directory was already created.

Also applies to: 151-152

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/upgrade-sandbox.sh` around lines 103 - 109, When FULL_DATA is
requested, the script must not proceed if the openshell sandbox download of
/sandbox/.openclaw-data/ into "${DEST}/openclaw-data/" fails; change the logic
around the openshell sandbox download (the block using FULL_DATA, mkdir -p
"${DEST}/openclaw-data", and the openshell command) so you only create the
destination directory after a successful download or save to a temporary
location and move it on success, and on download failure exit non‑zero (do not
just warn) to stop subsequent steps like the "nemoclaw onboard" sequence; also
ensure the later restore hint printing (around the existing restore-hint lines)
is suppressed when the download failed so you don't show misleading restore
instructions for an empty or absent snapshot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant