Skip to content

Commit 4faa35d

Browse files
committed
update k8s install docs
Signed-off-by: JaredforReal <[email protected]>
1 parent d3c29ad commit 4faa35d

File tree

1 file changed

+22
-6
lines changed

1 file changed

+22
-6
lines changed

website/docs/installation/kubernetes.md

Lines changed: 22 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -35,27 +35,43 @@ kubectl wait --for=condition=Ready nodes --all --timeout=300s
3535

3636
## Step 2: Deploy vLLM Semantic Router
3737

38-
Configure the semantic router by editing `deploy/kubernetes/config.yaml`. This file contains the vLLM configuration, including model config, endpoints, and policies.
38+
Configure the semantic router by editing `deploy/kubernetes/config.yaml`. This file contains the vLLM configuration, including model config, endpoints, and policies. The repository provides two Kustomize overlays similar to docker-compose profiles:
39+
40+
- core (default): only the semantic-router
41+
- Path: `deploy/kubernetes/overlays/core` (root `deploy/kubernetes/` points here by default)
42+
- llm-katan: semantic-router + an llm-katan sidecar listening on 8002 and serving model name `qwen3`
43+
- Path: `deploy/kubernetes/overlays/llm-katan`
3944

4045
Important notes before you apply manifests:
4146

4247
- `vllm_endpoints.address` must be an IP address (not hostname) reachable from inside the cluster. If your LLM backends run as K8s Services, use the ClusterIP (for example `10.96.0.10`) and set `port` accordingly. Do not include protocol or path.
4348
- The PVC in `deploy/kubernetes/pvc.yaml` uses `storageClassName: standard`. On some clouds or local clusters, the default StorageClass name may differ (e.g., `standard-rwo`, `gp2`, or a provisioner like local-path). Adjust as needed.
4449
- Default PVC size is 30Gi. Size it to at least 2–3x of your total model footprint to leave room for indexes and updates.
4550
- The initContainer downloads several models from Hugging Face on first run and writes them into the PVC. Ensure outbound egress to Hugging Face is allowed and there is at least ~6–8 GiB free space for the models specified.
51+
- Per mode, the init container downloads differ:
52+
- core: classifiers + the embedding model `sentence-transformers/all-MiniLM-L12-v2` into `/app/models/all-MiniLM-L12-v2`.
53+
- llm-katan: everything in core, plus `Qwen/Qwen3-0.6B` into `/app/models/Qwen/Qwen3-0.6B`.
54+
- The default `config.yaml` points to `qwen3` at `127.0.0.1:8002`, which matches the llm-katan overlay. If you use core (no sidecar), either change `vllm_endpoints` to your actual backend Service IP:Port, or deploy the llm-katan overlay.
4655

47-
Deploy the semantic router service with all required components:
56+
Deploy the semantic router service with all required components (core mode by default):
4857

49-
```bash
50-
# Deploy semantic router using Kustomize
58+
````bash
59+
# Deploy semantic router (core mode)
5160
kubectl apply -k deploy/kubernetes/
5261

5362
# Wait for deployment to be ready (this may take several minutes for model downloads)
5463
kubectl wait --for=condition=Available deployment/semantic-router -n vllm-semantic-router-system --timeout=600s
5564

5665
# Verify deployment status
5766
kubectl get pods -n vllm-semantic-router-system
58-
```
67+
68+
To run with the llm-katan overlay instead:
69+
70+
```bash
71+
kubectl apply -k deploy/kubernetes/overlays/llm-katan
72+
````
73+
74+
````
5975
6076
## Step 3: Install Envoy Gateway
6177
@@ -70,7 +86,7 @@ helm upgrade -i eg oci://docker.io/envoyproxy/gateway-helm \
7086

7187
# Wait for Envoy Gateway to be ready
7288
kubectl wait --timeout=300s -n envoy-gateway-system deployment/envoy-gateway --for=condition=Available
73-
```
89+
````
7490

7591
## Step 4: Install Envoy AI Gateway
7692

0 commit comments

Comments
 (0)