You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: deploy/kubernetes/README.md
+28-16Lines changed: 28 additions & 16 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,8 +1,9 @@
1
1
# Semantic Router Kubernetes Deployment
2
2
3
-
This directory contains Kubernetes manifests for deploying the Semantic Router using Kustomize.
3
+
This directory contains Kubernetes manifests for deploying the Semantic Router using Kustomize. It provides two modes similar to docker-compose profiles:
4
4
5
-
By default, the base kustomization deploys a Pod with an `llm-katan` sidecar so that the default config (qwen3 on 127.0.0.1:8002) works out-of-the-box. If you prefer to run without the sidecar, replace `deployment.with-llm-katan.yaml` with `deployment.yaml` in `kustomization.yaml`.
5
+
- core: only the semantic-router (no llm-katan)
6
+
- llm-katan: semantic-router plus an llm-katan sidecar listening on 8002 (served model name `qwen3`)
6
7
7
8
## Architecture
8
9
@@ -319,31 +320,42 @@ Edit the `resources` section in `deployment.yaml` accordingly.
319
320
320
321
### Kubernetes Manifests (`deploy/kubernetes/`)
321
322
322
-
-`deployment.yaml` - Main application deployment with optimized resource settings
323
-
-`deployment.with-llm-katan.yaml` - Optional variant including an llm-katan sidecar listening on 8002 (works with default config pointing to qwen3 at 127.0.0.1:8002)
324
-
-`service.yaml` - Services for gRPC, HTTP API, and metrics
-`overlays/core/` - Core deployment (no llm-katan)
325
+
-`overlays/llm-katan/` - Deployment with llm-katan sidecar
326
+
-`deployment.yaml` - Plain deployment (used by core overlay)
327
+
-`deployment.katan.yaml` - Sidecar deployment (used by llm-katan overlay)
328
+
-`service.yaml` - gRPC, HTTP API, and metrics services
325
329
-`pvc.yaml` - Persistent volume claim for model storage
326
330
-`namespace.yaml` - Dedicated namespace for the application
327
-
-`config.yaml` - Application configuration
331
+
-`config.yaml` - Application configuration (defaults to qwen3 @ 127.0.0.1:8002)
328
332
-`tools_db.json` - Tools database for semantic routing
329
-
-`kustomization.yaml` - Kustomize configuration for easy deployment
333
+
-`kustomization.yaml` - Root entry (defaults to core overlay)
330
334
331
335
### Development Tools
332
336
333
-
## Optional: run with llm-katan sidecar
337
+
## Choose a mode: core or llm-katan
334
338
335
-
To mimic the docker-compose default setup, you can deploy a variant that runs an `llm-katan` sidecar inside the same Pod. The provided `deployment.with-llm-katan.yaml` exposes llm-katan on `0.0.0.0:8002` and serves the model name `qwen3`.
339
+
- Core mode (default root points here):
336
340
337
-
Notes:
341
+
```bash
342
+
kubectl apply -k deploy/kubernetes
343
+
# or explicitly
344
+
kubectl apply -k deploy/kubernetes/overlays/core
345
+
```
338
346
339
-
- Ensure the Qwen model content is available at `/app/models/Qwen/Qwen3-0.6B` in the PVC. You can pre-populate the PV or customize the init container to fetch from an internal source.
340
-
- The default Kubernetes `config.yaml` has been aligned to use `qwen3` and endpoint `127.0.0.1:8002`, so it will work out-of-the-box with this sidecar.
347
+
- llm-katan mode:
341
348
342
-
Apply the sidecar variant instead of the default deployment:
- The init container will attempt to download `Qwen/Qwen3-0.6B` into `/app/models/Qwen/Qwen3-0.6B` and the embedding model `sentence-transformers/all-MiniLM-L12-v2` into `/app/models/all-MiniLM-L12-v2`. In restricted networks, these downloads may fail—pre-populate the PV or point the init script to your internal artifact store as needed.
358
+
- The default Kubernetes `config.yaml` has been aligned to use `qwen3` and endpoint `127.0.0.1:8002`.
347
359
348
360
-`tools/kind/kind-config.yaml` - Kind cluster configuration for local development
349
361
-`tools/make/kube.mk` - Make targets for Kubernetes operations
0 commit comments