fix(onboard): run inference curl probes without shell expansion by OffbeatOps · Pull Request #890 · NVIDIA/NemoClaw

OffbeatOps · 2026-03-25T14:57:44Z

Summary

Onboarding inference validation builds curl commands under bash -c and passes the API token via $NEMOCLAW_PROBE_API_KEY. That is fragile: corrupted or pasted keys (concatenation, stray CR/LF, shell-sensitive characters) can make curl fail before any HTTP response. Users then saw messages like HTTP 43 even though 43 was the curl exit code, not an HTTP status.

Changes

Run the same probes and /models fetches with spawnSync("curl", argv, …) so headers and bodies are passed as argv (no shell re-parsing of secrets).
Add --http1.1 on those requests to reduce occasional HTTP/2-related failures on some networks.
When curl exits non-zero, report curl failed (exit N) and include a short stderr snippet instead of labeling the code as HTTP.
Normalize credentials: trim and strip \r for values read from the environment and ~/.nemoclaw/credentials.json.

Testing

node --check on the modified files.
Manually exercised NVIDIA Endpoints onboarding after these changes (chat completions probe succeeds; Responses probe still 404 as expected for integrate.api.nvidia.com).

Happy to adjust if you prefer keeping HTTP/2 for probes or want the credential normalization scoped to specific keys only.

Summary by CodeRabbit

Bug Fixes
- Improved credential normalization to consistently handle whitespace and carriage returns across all secret sources.
- Enhanced error messages when probing network endpoints with additional diagnostic details.
Improvements
- Better HTTP status code detection and error reporting during network configuration steps.

- Use spawnSync("curl", argv) for OpenAI-like probes, Anthropic probes, and /models fetches so credentials are not reinterpreted by bash. - Add --http1.1 on those requests to reduce HTTP/2-related failures on some networks. - Report curl exit codes as curl failures (not HTTP status) and include trimmed stderr. - Trim secrets loaded from the environment and credentials.json and strip CR characters.

coderabbitai · 2026-03-25T14:58:05Z

📝 Walkthrough

Walkthrough

Added a normalizeSecret helper function to sanitize credentials by removing carriage returns and whitespace. Refactored network probing functions in onboard.js to replace shell-based curl invocations with direct spawnSync("curl", ...) calls, eliminating environment variable injection in shell commands and improving error reporting with stderr output.

Changes

Cohort / File(s)	Summary
Credential Normalization `bin/lib/credentials.js`	Added `normalizeSecret()` helper to standardize credential handling by removing `\r` characters and trimming whitespace. Updated `getCredential()` to apply normalization consistently to both environment and file-based credentials, returning `null` for absent/empty values.
Network Probe Refactoring `bin/lib/onboard.js`	Replaced shell-constructed bash commands with direct `spawnSync("curl", ...)` invocations across five probe/fetch functions. Added `getCurlSpawnArgs()` and `probeDisplayCode()` helpers. Enhanced `summarizeProbeError()` to include truncated stderr output for curl exit codes (1–99), improving diagnostic information.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 Secrets now trimmed and all shiny and clean,
No carriage returns mucking up the scene,
With curl now spawned, no shell tricks in sight,
Our probes run direct—secure and tight! 🔐

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 6.67% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'fix(onboard): run inference curl probes without shell expansion' directly and accurately summarizes the main change: replacing bash-spawned curl commands with direct spawnSync curl calls to avoid shell expansion issues.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@bin/lib/credentials.js`:
- Around line 28-38: getCredential currently returns a normalized secret but
does not propagate that cleaned value back into process.env, which lets
onboarding's setupInference -> upsertProvider still read the original env var
(resolvedCredentialEnv) with CR/LF; update getCredential to normalize and then
assign the cleaned value back into process.env[key] before returning (use
normalizeSecret and loadCredentials as needed), or alternatively change the
provider-setup path (setupInference/upsertProvider) to call
getCredential(resolvedCredentialEnv) instead of reading process.env directly so
the cleaned value is consumed when exporting credentials.

In `@bin/lib/onboard.js`:
- Around line 365-369: probeDisplayCode currently only checks result.status and
returns null for spawnSync launch failures; change it to first check
result.error and result.signal and raise/return a clear failure instead of null:
if result.error exists, throw (or return an error object) with a message like
"curl failed to start: <result.error.code|message>", and if result.signal
exists, throw/return "curl terminated by signal <result.signal>". Apply the same
change to the other identical probe block referenced (the logic around the
second probe at the other occurrence) so callers receive actionable error
information rather than HTTP null.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: fb779e01-5d57-413c-a5a6-c1868fe72e44

📥 Commits

Reviewing files that changed from the base of the PR and between b2164e7 and c5658ad.

📒 Files selected for processing (2)

bin/lib/credentials.js
bin/lib/onboard.js

coderabbitai · 2026-03-25T15:11:14Z

bin/lib/credentials.js

+function normalizeSecret(value) {
+  if (value == null) return null;
+  return String(value).replace(/\r/g, "").trim();
+}
+
 function getCredential(key) {
-  if (process.env[key]) return process.env[key];
+  if (process.env[key]) return normalizeSecret(process.env[key]);
  const creds = loadCredentials();
-  return creds[key] || null;
+  const raw = creds[key];
+  if (raw == null) return null;
+  return normalizeSecret(raw);


⚠️ Potential issue | 🟠 Major

Propagate the cleaned secret into the environment.

getCredential() only returns the normalized value. In non-interactive onboarding, setupInference() still forwards process.env[resolvedCredentialEnv] unchanged to upsertProvider() at bin/lib/onboard.js Line 1957 through Line 1960, so a key with trailing CR/LF can pass the probe here and then still be written to OpenShell with the bad bytes. Please normalize before exporting, or have the provider-setup path consume getCredential() instead of the raw env var.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@bin/lib/credentials.js` around lines 28 - 38, getCredential currently returns a normalized secret but does not propagate that cleaned value back into process.env, which lets onboarding's setupInference -> upsertProvider still read the original env var (resolvedCredentialEnv) with CR/LF; update getCredential to normalize and then assign the cleaned value back into process.env[key] before returning (use normalizeSecret and loadCredentials as needed), or alternatively change the provider-setup path (setupInference/upsertProvider) to call getCredential(resolvedCredentialEnv) instead of reading process.env directly so the cleaned value is consumed when exporting credentials.

coderabbitai · 2026-03-25T15:11:14Z

bin/lib/onboard.js

+function probeDisplayCode(result) {
+  const httpCode = Number(String(result.stdout || "").trim());
+  if (result.status !== 0) return result.status;
+  return Number.isFinite(httpCode) ? httpCode : 0;
+}


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

In Node.js child_process.spawnSync, what values do result.status, result.signal, and result.error have when the executable cannot be found or when the child process is terminated by a signal?

💡 Result:

When the executable cannot be found (ENOENT error): - result.status: null - result.signal: null - result.error: Error object with code 'ENOENT' (e.g., Error: spawnSync not_found ENOENT) When the child process is terminated by a signal: - result.status: null - result.signal: the signal name (e.g., 'SIGTERM', 'SIGKILL') - result.error: undefined

Citations:

1: doc: child_process.SpawnSync output, stdout, stderr can be null nodejs/node#33458

2: https://docs.deno.com/api/node/child_process/~/SpawnSyncReturns

3: https://nodejs.org/docs/latest/api/child_process.html

4: https://stackoverflow.com/questions/37125619/spawnsyncnpm-install-gives-error-spawnsync-npm-enoent

🏁 Script executed:

cat -n bin/lib/onboard.js | sed -n '360,375p'

Repository: NVIDIA/NemoClaw

Length of output: 1870

🏁 Script executed:

cat -n bin/lib/onboard.js | sed -n '525,545p'

Repository: NVIDIA/NemoClaw

Length of output: 899

🏁 Script executed:

# Search for callers of probeDisplayCode to understand the impact rg "probeDisplayCode" bin/lib/onboard.js -B 3 -A 3

Repository: NVIDIA/NemoClaw

Length of output: 2136

Handle spawnSync() launch failures before formatting them as HTTP errors.

If curl cannot be started or the child is terminated by a signal, spawnSync() sets result.status to null and reports the failure via result.error (with code 'ENOENT' for missing executable) or result.signal (for signal termination). The probeDisplayCode() helper currently returns null in these cases, so callers end up showing HTTP null with no response body instead of an actionable curl failed to start or curl terminated message.

Check result.error and result.signal in addition to result.status to provide meaningful error messages to users.

Also applies to: 532-536

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@bin/lib/onboard.js` around lines 365 - 369, probeDisplayCode currently only checks result.status and returns null for spawnSync launch failures; change it to first check result.error and result.signal and raise/return a clear failure instead of null: if result.error exists, throw (or return an error object) with a message like "curl failed to start: <result.error.code|message>", and if result.signal exists, throw/return "curl terminated by signal <result.signal>". Apply the same change to the other identical probe block referenced (the logic around the second probe at the other occurrence) so callers receive actionable error information rather than HTTP null.

## Summary Smooth out inference configuration during `install.sh` / `nemoclaw onboard`, especially when provider authorization, credential formatting, endpoint probing, or final inference application fail. This PR makes the hosted-provider onboarding path recoverable instead of brittle: - normalize and safely handle credential input - classify validation failures more accurately - let users re-enter credentials in place - make final `openshell inference set` failures recoverable - normalize over-specified custom base URLs - add lower-level `back` / `exit` navigation so users can move up a level without restarting the whole install - clarify recovery prompts with explicit commands (`retry`, `back`, `exit`) ## What Changed - refactored provider probe execution to use direct `curl` argv invocation instead of `bash -c` - normalized credential values before use/persistence - added structured auth / transport / model / endpoint failure classification - added in-place credential re-entry for hosted providers: - NVIDIA Endpoints - OpenAI - Anthropic - Google Gemini - custom OpenAI-compatible endpoints - custom Anthropic-compatible endpoints - wrapped final provider/apply failures in interactive recovery instead of hard abort - added command-style recovery prompts: - `retry` - `back` - `exit` - allowed `back` from lower-level inference prompts (model entry, base URL entry, recovery prompts) - normalized custom endpoint inputs to the minimum usable base URL - removed stale `NVIDIA Endpoints (recommended)` wording - secret prompts now show masked `*` feedback while typing/pasting ## Validation ```bash npx vitest run test/credentials.test.js test/onboard-selection.test.js test/onboard.test.js npx vitest run test/cli.test.js npx eslint bin/lib/credentials.js bin/lib/onboard.js test/credentials.test.js test/onboard-selection.test.js test/onboard.test.js npx tsc -p jsconfig.json --noEmit ``` ## Issue Mapping Fully addressed in this PR: - Fixes #1099 - Fixes #1101 - Fixes #1130 Substantially addressed / partially addressed: - #987 - improves NVIDIA validation behavior and failure classification so false/misleading connectivity failures are much less likely, but this PR is framed as onboarding recovery hardening rather than a WSL-specific networking fix - #301 - improves graceful handling when validation/apply fails, especially for transport/upstream problems, but does not add provider auto-fallback or a broader cloud-outage fallback strategy - #446 - improves recovery specifically for the inference-configuration step, but does not fully solve general resumability across all onboarding steps Related implementation direction: - #890 - this PR aligns with the intent of safer/non-shell probe execution and clearer validation reporting - #380 - not implemented here; no automatic provider fallback was added in this branch ## Notes - This PR intentionally does not weaken validation or reopen old onboarding shortcuts. - Unrelated local `tmp/` noise was left out of the branch. Signed-off-by: Kevin Jones <kejones@nvidia.com>  ## Summary by CodeRabbit * **New Features** * Interactive onboarding navigation (`back`/`exit`/`quit`) with credential re-prompting and retry flows. * Improved probe/validation flow with clearer recovery options and more robust sandbox build progress messages. * Secret input masks with reliable backspace behavior. * **Bug Fixes** * Credential sanitization (trim/line-ending normalization) and API key validation now normalize and retry instead of exiting. * Better classification and messaging for authorization/validation failures; retries where appropriate. * **Tests** * Expanded tests for credential prompts, masking, retry flows, validation classification, and onboarding navigation.

wscurran · 2026-03-31T14:59:14Z

✨ Thanks for submitting this PR with a detailed summary, it proposes a fix to improve the onboarding experience and prevent issues with curl probes.

…ript modules (#1240) ## Summary - Extract ~210 lines of pure, side-effect-free functions from the 3,800-line `onboard.js` into **5 typed TypeScript modules** under `src/lib/`: - `gateway-state.ts` — gateway/sandbox state classification from openshell output - `validation.ts` — failure classification, API key validation, model ID checks - `url-utils.ts` — URL normalization, text compaction, env formatting - `build-context.ts` — Docker build context filtering, recovery hints - `dashboard.ts` — dashboard URL resolution and construction - Add **56 co-located unit tests** (`src/lib/*.test.ts`) for the extracted modules - Set up CLI TypeScript compilation: `tsconfig.src.json` compiles `src/` → `dist/` as CJS - `onboard.js` imports from compiled `dist/lib/` output — transparent to callers - Pre-commit hook updated to build TS and include `dist/lib/` in coverage These functions are **not touched by any #924 blocker PR** (#781, #782, #819, #672, #634, #890), so this extraction is safe to land immediately. ## Test plan - [x] 598 CLI tests pass (542 existing + 56 new) - [x] `tsc -p tsconfig.src.json` compiles cleanly - [x] `tsc -p tsconfig.cli.json` type-checks cleanly - [x] `tsc -p jsconfig.json` type-checks cleanly - [x] Coverage ratchet passes with `dist/lib/` included Closes #1237. Relates to #924 (shell consolidation). 🤖 Generated with [Claude Code](https://claude.com/claude-code)  ## Summary by CodeRabbit * **New Features** * Improved sandbox-creation recovery hints and targeted remediation commands. * Smarter dashboard URL resolution and control-UI URL construction. * **Bug Fixes** * More accurate gateway and sandbox state detection. * Enhanced classification of validation/apply failures and safer model/key validation. * Better provider URL normalization and loopback handling. * **Tests** * Added comprehensive tests covering new utilities. * **Chores** * CLI now builds before CLI tests; CI/commit hooks updated to run the CLI build.  --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

## Summary Smooth out inference configuration during `install.sh` / `nemoclaw onboard`, especially when provider authorization, credential formatting, endpoint probing, or final inference application fail. This PR makes the hosted-provider onboarding path recoverable instead of brittle: - normalize and safely handle credential input - classify validation failures more accurately - let users re-enter credentials in place - make final `openshell inference set` failures recoverable - normalize over-specified custom base URLs - add lower-level `back` / `exit` navigation so users can move up a level without restarting the whole install - clarify recovery prompts with explicit commands (`retry`, `back`, `exit`) ## What Changed - refactored provider probe execution to use direct `curl` argv invocation instead of `bash -c` - normalized credential values before use/persistence - added structured auth / transport / model / endpoint failure classification - added in-place credential re-entry for hosted providers: - NVIDIA Endpoints - OpenAI - Anthropic - Google Gemini - custom OpenAI-compatible endpoints - custom Anthropic-compatible endpoints - wrapped final provider/apply failures in interactive recovery instead of hard abort - added command-style recovery prompts: - `retry` - `back` - `exit` - allowed `back` from lower-level inference prompts (model entry, base URL entry, recovery prompts) - normalized custom endpoint inputs to the minimum usable base URL - removed stale `NVIDIA Endpoints (recommended)` wording - secret prompts now show masked `*` feedback while typing/pasting ## Validation ```bash npx vitest run test/credentials.test.js test/onboard-selection.test.js test/onboard.test.js npx vitest run test/cli.test.js npx eslint bin/lib/credentials.js bin/lib/onboard.js test/credentials.test.js test/onboard-selection.test.js test/onboard.test.js npx tsc -p jsconfig.json --noEmit ``` ## Issue Mapping Fully addressed in this PR: - Fixes #1099 - Fixes #1101 - Fixes #1130 Substantially addressed / partially addressed: - #987 - improves NVIDIA validation behavior and failure classification so false/misleading connectivity failures are much less likely, but this PR is framed as onboarding recovery hardening rather than a WSL-specific networking fix - #301 - improves graceful handling when validation/apply fails, especially for transport/upstream problems, but does not add provider auto-fallback or a broader cloud-outage fallback strategy - #446 - improves recovery specifically for the inference-configuration step, but does not fully solve general resumability across all onboarding steps Related implementation direction: - #890 - this PR aligns with the intent of safer/non-shell probe execution and clearer validation reporting - #380 - not implemented here; no automatic provider fallback was added in this branch ## Notes - This PR intentionally does not weaken validation or reopen old onboarding shortcuts. - Unrelated local `tmp/` noise was left out of the branch. Signed-off-by: Kevin Jones <kejones@nvidia.com>  ## Summary by CodeRabbit * **New Features** * Interactive onboarding navigation (`back`/`exit`/`quit`) with credential re-prompting and retry flows. * Improved probe/validation flow with clearer recovery options and more robust sandbox build progress messages. * Secret input masks with reliable backspace behavior. * **Bug Fixes** * Credential sanitization (trim/line-ending normalization) and API key validation now normalize and retry instead of exiting. * Better classification and messaging for authorization/validation failures; retries where appropriate. * **Tests** * Expanded tests for credential prompts, masking, retry flows, validation classification, and onboarding navigation.

…ript modules (#1240) ## Summary - Extract ~210 lines of pure, side-effect-free functions from the 3,800-line `onboard.js` into **5 typed TypeScript modules** under `src/lib/`: - `gateway-state.ts` — gateway/sandbox state classification from openshell output - `validation.ts` — failure classification, API key validation, model ID checks - `url-utils.ts` — URL normalization, text compaction, env formatting - `build-context.ts` — Docker build context filtering, recovery hints - `dashboard.ts` — dashboard URL resolution and construction - Add **56 co-located unit tests** (`src/lib/*.test.ts`) for the extracted modules - Set up CLI TypeScript compilation: `tsconfig.src.json` compiles `src/` → `dist/` as CJS - `onboard.js` imports from compiled `dist/lib/` output — transparent to callers - Pre-commit hook updated to build TS and include `dist/lib/` in coverage These functions are **not touched by any #924 blocker PR** (#781, #782, #819, #672, #634, #890), so this extraction is safe to land immediately. ## Test plan - [x] 598 CLI tests pass (542 existing + 56 new) - [x] `tsc -p tsconfig.src.json` compiles cleanly - [x] `tsc -p tsconfig.cli.json` type-checks cleanly - [x] `tsc -p jsconfig.json` type-checks cleanly - [x] Coverage ratchet passes with `dist/lib/` included Closes #1237. Relates to #924 (shell consolidation). 🤖 Generated with [Claude Code](https://claude.com/claude-code)  ## Summary by CodeRabbit * **New Features** * Improved sandbox-creation recovery hints and targeted remediation commands. * Smarter dashboard URL resolution and control-UI URL construction. * **Bug Fixes** * More accurate gateway and sandbox state detection. * Enhanced classification of validation/apply failures and safer model/key validation. * Better provider URL normalization and loopback handling. * **Tests** * Added comprehensive tests covering new utilities. * **Chores** * CLI now builds before CLI tests; CI/commit hooks updated to run the CLI build.  --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai bot reviewed Mar 25, 2026

View reviewed changes

kjw3 mentioned this pull request Mar 31, 2026

fix: harden inference onboarding auth and validation recovery #1136

Merged

wscurran added the fix label Mar 31, 2026

wscurran added the Getting Started Use this label to identify setup, installation, or onboarding issues. label Mar 31, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(onboard): run inference curl probes without shell expansion#890

fix(onboard): run inference curl probes without shell expansion#890
OffbeatOps wants to merge 1 commit intoNVIDIA:mainfrom
OffbeatOps:fix/onboard-inference-curl-probes

OffbeatOps commented Mar 25, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Mar 25, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Mar 25, 2026

Uh oh!

coderabbitai bot Mar 25, 2026

Uh oh!

wscurran commented Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

OffbeatOps commented Mar 25, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Testing

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

wscurran commented Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

OffbeatOps commented Mar 25, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 25, 2026 •

edited

Loading