Summary
On a fresh local OpenShell gateway, inference.local inside a sandbox consistently returns 404 page not found for both:
POST /v1/chat/completions (OpenAI-style)
POST /v1/responses (per the docs’ “Verify from sandbox” example)
This happens even though:
- Gateway inference is configured with a valid NVIDIA provider and Nemotron 3 model.
- The sandbox proxy does intercept these calls and routes them through
navigator_router to https://integrate.api.nvidia.com/v1 with the expected paths.
This effectively breaks the documented https://inference.local inference routing path.
Environment
- Host: Windows 11 + WSL2 (Ubuntu, Docker Engine in WSL)
- OpenShell CLI: installed via
uv pip install openshell --pre from internal nv-shared-pypi
- Docker: logged in to
ghcr.io with PAT (including SSO) and able to pull ghcr.io/nvidia/openshell/* images
- Gateway: started via
openshell gateway start on WSL host
- Inference backend: NVIDIA Inference API, Nemotron 3 Nano 30B (works directly from WSL with my key)
Steps to Reproduce
1. Start gateway (host / WSL)
# In WSL
uv venv .venv
source .venv/bin/activate
uv pip install openshell --upgrade --pre \
--index-url [https://urm.nvidia.com/artifactory/api/pypi/nv-shared-pypi/simple](https://urm.nvidia.com/artifactory/api/pypi/nv-shared-pypi/simple)
openshell gateway start
→ Gateway ready, e.g. Endpoint: https://127.0.0.1:8080
2. Configure NVIDIA provider + Nemotron 3 inference (host / WSL)
export NVIDIA_API_KEY="YOUR_INFERENCE_API_KEY" # same key that works directly against inference-api.nvidia.com
openshell provider create \
--name nvidia-prod \
--type nvidia \
--from-existing
openshell inference set \
--provider nvidia-prod \
--model nvidia/nvidia/Nemotron-3-Nano-30B-A3B
openshell inference get
Output:
Gateway inference:
Provider: nvidia-prod
Model: nvidia/nvidia/Nemotron-3-Nano-30B-A3B
Version: 1
System inference:
Not configured
3. Create and connect to sandbox
openshell sandbox create --name test
openshell sandbox list # wait until Ready
openshell sandbox connect test
prompt: sandbox@test:~$
4. Test /v1/chat/completions from sandbox
pip install openai
python - << 'EOF'
from openai import OpenAI
client = OpenAI(
base_url="[https://inference.local/v1](https://inference.local/v1)",
api_key="dummy", # ignored by OpenShell; routing uses configured provider
)
resp = client.chat.completions.create(
model="anything", # should be rewritten to configured model
messages=[{"role": "user", "content": "Hello from OpenShell sandbox!"}],
temperature=0.7,
max_tokens=128,
)
print(resp.choices[0].message.content)
EOF
Actual result:
openai.NotFoundError: 404 page not found
5. Test /v1/responses from sandbox (per docs)
pip install requests
python - << 'EOF'
import requests, json
url = "[https://inference.local/v1/responses](https://inference.local/v1/responses)"
payload = {
"instructions": "You are a helpful assistant.",
"input": "Hello from OpenShell sandbox!",
}
resp = requests.post(url, json=payload, timeout=60)
print("Status:", resp.status_code)
print("Body:", resp.text[:500])
EOF
Actual result:
Status: 404
Body: 404 page not found
What I Expected
Given:
openshell inference get shows a configured NVIDIA provider + Nemotron model.
- Docs state that
/v1/chat/completions and /v1/responses are recognized inference patterns for inference.local.
- The “Verify the Endpoint from a Sandbox” example uses
POST /v1/responses.
I expected:
POST https://inference.local/v1/chat/completions and
POST https://inference.local/v1/responses
to return a normal model response (HTTP 200 + JSON) from inside the sandbox.
What Actually Happens
- Both endpoints return a simple
404 page not found from inside the sandbox.
- There is no obvious configuration error on the host/sandbox side (gateway, provider, and inference are all reported as healthy).
Relevant Logs (openshell logs -g openshell)
1773260787.772 INFO Fetching inference route bundle from gateway endpoint=[https://openshell.openshell.svc.cluster.local:8080](https://openshell.openshell.svc.cluster.local:8080)
1773260787.822 INFO Loaded inference route bundle revision=6ce65bfa03d7bff0 route_count=1
1773260787.822 INFO Inference routing enabled with local execution route_count=1
1773260787.823 INFO Proxy listening (tcp) addr=10.200.0.1:3128
... sandbox [navigator_sandbox::proxy] Intercepted inference request, routing locally kind=chat_completion method=POST path=/v1/chat/completions protocol=openai_chat_completions
1773260870.962 INFO routing proxy inference request endpoint=[https://integrate.api.nvidia.com/v1](https://integrate.api.nvidia.com/v1) method=POST path=/v1/chat/completions protocols=openai_chat_completions,openai_completions,openai_responses,model_discovery
... sandbox [navigator_sandbox::proxy] Intercepted inference request, routing locally kind=responses method=POST path=/v1/responses protocol=openai_responses
1773261095.914 INFO routing proxy inference request endpoint=[https://integrate.api.nvidia.com/v1](https://integrate.api.nvidia.com/v1) method=POST path=/v1/responses protocols=openai_chat_completions,openai_completions,openai_responses,model_discovery
Notes:
- The proxy does intercept
inference.local and classifies both /v1/chat/completions and /v1/responses as inference requests.
navigator_router is invoked with endpoint=https://integrate.api.nvidia.com/v1 and path=/v1/....
- Despite this, the sandbox receives
404 page not found for both URLs.
Separately, I’ve confirmed that my NVIDIA Inference API key + Nemotron 3 model work fine directly from WSL against https://inference-api.nvidia.com/v1/chat/completions with the same model ID.
Questions
- Is
integrate.api.nvidia.com/v1 the intended upstream endpoint for the nvidia provider in this build?
- Should the router be constructing
/v1/chat/completions and /v1/responses against that base as-is, or is there a known issue with the current OpenShell server image’s inference routing?
- Is there a different path or configuration I should be using to exercise
inference.local from inside a sandbox on the current version?
Happy to provide more logs or try a specific build/tag if that helps narrow it down.
Summary
On a fresh local OpenShell gateway,
inference.localinside a sandbox consistently returns404 page not foundfor both:POST /v1/chat/completions(OpenAI-style)POST /v1/responses(per the docs’ “Verify from sandbox” example)This happens even though:
navigator_routertohttps://integrate.api.nvidia.com/v1with the expected paths.This effectively breaks the documented
https://inference.localinference routing path.Environment
uv pip install openshell --prefrom internalnv-shared-pypighcr.iowith PAT (including SSO) and able to pullghcr.io/nvidia/openshell/*imagesopenshell gateway starton WSL hostSteps to Reproduce
1. Start gateway (host / WSL)
→ Gateway ready, e.g. Endpoint: https://127.0.0.1:8080
2. Configure NVIDIA provider + Nemotron 3 inference (host / WSL)
Output:
3. Create and connect to sandbox
prompt:
sandbox@test:~$4. Test
/v1/chat/completionsfrom sandboxActual result:
5. Test
/v1/responsesfrom sandbox (per docs)Actual result:
What I Expected
Given:
openshell inference getshows a configured NVIDIA provider + Nemotron model./v1/chat/completionsand/v1/responsesare recognized inference patterns forinference.local.POST /v1/responses.I expected:
POST https://inference.local/v1/chat/completionsandPOST https://inference.local/v1/responsesto return a normal model response (HTTP 200 + JSON) from inside the sandbox.
What Actually Happens
404 page not foundfrom inside the sandbox.Relevant Logs (
openshell logs -g openshell)Notes:
inference.localand classifies both/v1/chat/completionsand/v1/responsesas inference requests.navigator_routeris invoked withendpoint=https://integrate.api.nvidia.com/v1andpath=/v1/....404 page not foundfor both URLs.Separately, I’ve confirmed that my NVIDIA Inference API key + Nemotron 3 model work fine directly from WSL against
https://inference-api.nvidia.com/v1/chat/completionswith the same model ID.Questions
integrate.api.nvidia.com/v1the intended upstream endpoint for thenvidiaprovider in this build?/v1/chat/completionsand/v1/responsesagainst that base as-is, or is there a known issue with the current OpenShell server image’s inference routing?inference.localfrom inside a sandbox on the current version?Happy to provide more logs or try a specific build/tag if that helps narrow it down.