You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Configure the semantic router by editing `deploy/kubernetes/config.yaml`. This file contains the vLLM configuration, including model config, endpoints, and policies.
38
+
Configure the semantic router by editing `deploy/kubernetes/config.yaml`. This file contains the vLLM configuration, including model config, endpoints, and policies. The repository provides two Kustomize overlays similar to docker-compose profiles:
39
+
40
+
- core (default): only the semantic-router
41
+
- Path: `deploy/kubernetes/overlays/core` (root `deploy/kubernetes/` points here by default)
42
+
- llm-katan: semantic-router + an llm-katan sidecar listening on 8002 and serving model name `qwen3`
43
+
- Path: `deploy/kubernetes/overlays/llm-katan`
39
44
40
45
Important notes before you apply manifests:
41
46
42
47
-`vllm_endpoints.address` must be an IP address (not hostname) reachable from inside the cluster. If your LLM backends run as K8s Services, use the ClusterIP (for example `10.96.0.10`) and set `port` accordingly. Do not include protocol or path.
43
48
- The PVC in `deploy/kubernetes/pvc.yaml` uses `storageClassName: standard`. On some clouds or local clusters, the default StorageClass name may differ (e.g., `standard-rwo`, `gp2`, or a provisioner like local-path). Adjust as needed.
44
49
- Default PVC size is 30Gi. Size it to at least 2–3x of your total model footprint to leave room for indexes and updates.
45
50
- The initContainer downloads several models from Hugging Face on first run and writes them into the PVC. Ensure outbound egress to Hugging Face is allowed and there is at least ~6–8 GiB free space for the models specified.
51
+
- Per mode, the init container downloads differ:
52
+
- core: classifiers + the embedding model `sentence-transformers/all-MiniLM-L12-v2` into `/app/models/all-MiniLM-L12-v2`.
53
+
- llm-katan: everything in core, plus `Qwen/Qwen3-0.6B` into `/app/models/Qwen/Qwen3-0.6B`.
54
+
- The default `config.yaml` points to `qwen3` at `127.0.0.1:8002`, which matches the llm-katan overlay. If you use core (no sidecar), either change `vllm_endpoints` to your actual backend Service IP:Port, or deploy the llm-katan overlay.
46
55
47
-
Deploy the semantic router service with all required components:
56
+
Deploy the semantic router service with all required components (core mode by default):
48
57
49
-
```bash
50
-
# Deploy semantic router using Kustomize
58
+
````bash
59
+
# Deploy semantic router (core mode)
51
60
kubectl apply -k deploy/kubernetes/
52
61
53
62
# Wait for deployment to be ready (this may take several minutes for model downloads)
0 commit comments