Skip to content

Commit d3c29ad

Browse files
committed
seperate core and llm-katan
Signed-off-by: JaredforReal <[email protected]>
1 parent a136572 commit d3c29ad

File tree

7 files changed

+192
-147
lines changed

7 files changed

+192
-147
lines changed

deploy/kubernetes/README.md

Lines changed: 28 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,9 @@
11
# Semantic Router Kubernetes Deployment
22

3-
This directory contains Kubernetes manifests for deploying the Semantic Router using Kustomize.
3+
This directory contains Kubernetes manifests for deploying the Semantic Router using Kustomize. It provides two modes similar to docker-compose profiles:
44

5-
By default, the base kustomization deploys a Pod with an `llm-katan` sidecar so that the default config (qwen3 on 127.0.0.1:8002) works out-of-the-box. If you prefer to run without the sidecar, replace `deployment.with-llm-katan.yaml` with `deployment.yaml` in `kustomization.yaml`.
5+
- core: only the semantic-router (no llm-katan)
6+
- llm-katan: semantic-router plus an llm-katan sidecar listening on 8002 (served model name `qwen3`)
67

78
## Architecture
89

@@ -319,31 +320,42 @@ Edit the `resources` section in `deployment.yaml` accordingly.
319320

320321
### Kubernetes Manifests (`deploy/kubernetes/`)
321322

322-
- `deployment.yaml` - Main application deployment with optimized resource settings
323-
- `deployment.with-llm-katan.yaml` - Optional variant including an llm-katan sidecar listening on 8002 (works with default config pointing to qwen3 at 127.0.0.1:8002)
324-
- `service.yaml` - Services for gRPC, HTTP API, and metrics
323+
- `base/` - Shared resources (Namespace, PVC, Service, ConfigMap)
324+
- `overlays/core/` - Core deployment (no llm-katan)
325+
- `overlays/llm-katan/` - Deployment with llm-katan sidecar
326+
- `deployment.yaml` - Plain deployment (used by core overlay)
327+
- `deployment.katan.yaml` - Sidecar deployment (used by llm-katan overlay)
328+
- `service.yaml` - gRPC, HTTP API, and metrics services
325329
- `pvc.yaml` - Persistent volume claim for model storage
326330
- `namespace.yaml` - Dedicated namespace for the application
327-
- `config.yaml` - Application configuration
331+
- `config.yaml` - Application configuration (defaults to qwen3 @ 127.0.0.1:8002)
328332
- `tools_db.json` - Tools database for semantic routing
329-
- `kustomization.yaml` - Kustomize configuration for easy deployment
333+
- `kustomization.yaml` - Root entry (defaults to core overlay)
330334

331335
### Development Tools
332336

333-
## Optional: run with llm-katan sidecar
337+
## Choose a mode: core or llm-katan
334338

335-
To mimic the docker-compose default setup, you can deploy a variant that runs an `llm-katan` sidecar inside the same Pod. The provided `deployment.with-llm-katan.yaml` exposes llm-katan on `0.0.0.0:8002` and serves the model name `qwen3`.
339+
- Core mode (default root points here):
336340

337-
Notes:
341+
```bash
342+
kubectl apply -k deploy/kubernetes
343+
# or explicitly
344+
kubectl apply -k deploy/kubernetes/overlays/core
345+
```
338346

339-
- Ensure the Qwen model content is available at `/app/models/Qwen/Qwen3-0.6B` in the PVC. You can pre-populate the PV or customize the init container to fetch from an internal source.
340-
- The default Kubernetes `config.yaml` has been aligned to use `qwen3` and endpoint `127.0.0.1:8002`, so it will work out-of-the-box with this sidecar.
347+
- llm-katan mode:
341348

342-
Apply the sidecar variant instead of the default deployment:
349+
```bash
350+
kubectl apply -k deploy/kubernetes/overlays/llm-katan
351+
```
343352

344-
```bash
345-
kubectl apply -n vllm-semantic-router-system -f deploy/kubernetes/deployment.with-llm-katan.yaml
346-
```
353+
Notes for llm-katan:
354+
355+
Notes for llm-katan:
356+
357+
- The init container will attempt to download `Qwen/Qwen3-0.6B` into `/app/models/Qwen/Qwen3-0.6B` and the embedding model `sentence-transformers/all-MiniLM-L12-v2` into `/app/models/all-MiniLM-L12-v2`. In restricted networks, these downloads may fail—pre-populate the PV or point the init script to your internal artifact store as needed.
358+
- The default Kubernetes `config.yaml` has been aligned to use `qwen3` and endpoint `127.0.0.1:8002`.
347359

348360
- `tools/kind/kind-config.yaml` - Kind cluster configuration for local development
349361
- `tools/make/kube.mk` - Make targets for Kubernetes operations
Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
apiVersion: kustomize.config.k8s.io/v1beta1
2+
kind: Kustomization
3+
4+
resources:
5+
- ../namespace.yaml
6+
- ../pvc.yaml
7+
- ../service.yaml
8+
9+
configMapGenerator:
10+
- name: semantic-router-config
11+
files:
12+
- ../config.yaml
13+
- ../tools_db.json
14+
15+
namespace: vllm-semantic-router-system
16+
17+
images:
18+
- name: ghcr.io/vllm-project/semantic-router/extproc
19+
newTag: latest

deploy/kubernetes/deployment.with-llm-katan.yaml renamed to deploy/kubernetes/deployment.katan.yaml

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -63,11 +63,23 @@ spec:
6363
echo "PII token classifier model already exists, skipping..."
6464
fi
6565
66+
# Download embedding model all-MiniLM-L12-v2
67+
if [ ! -d "all-MiniLM-L12-v2" ]; then
68+
echo "Downloading all-MiniLM-L12-v2 embedding model..."
69+
huggingface-cli download sentence-transformers/all-MiniLM-L12-v2 --local-dir all-MiniLM-L12-v2
70+
else
71+
echo "all-MiniLM-L12-v2 already exists, skipping..."
72+
fi
73+
6674
# Optional: Prepare Qwen model directory for llm-katan sidecar
6775
# NOTE: Provide the model content under /app/models/Qwen/Qwen3-0.6B via pre-populated PV
6876
# or customize the following block to fetch from your internal artifact store.
6977
if [ ! -d "Qwen/Qwen3-0.6B" ]; then
70-
echo "Qwen3-0.6B directory not found. Please pre-populate /app/models/Qwen/Qwen3-0.6B in the PVC or customize init script to download it."
78+
echo "Downloading Qwen/Qwen3-0.6B for llm-katan..."
79+
mkdir -p Qwen
80+
huggingface-cli download Qwen/Qwen3-0.6B --local-dir Qwen/Qwen3-0.6B || echo "Warning: Qwen3-0.6B download failed; ensure offline pre-population if needed."
81+
else
82+
echo "Qwen/Qwen3-0.6B already exists, skipping..."
7183
fi
7284
7385
echo "Model directory listing:" && ls -la /app/models/

deploy/kubernetes/deployment.yaml

Lines changed: 118 additions & 109 deletions
Original file line numberDiff line numberDiff line change
@@ -16,121 +16,130 @@ spec:
1616
app: semantic-router
1717
spec:
1818
initContainers:
19-
- name: model-downloader
20-
image: python:3.11-slim
21-
securityContext:
22-
runAsNonRoot: false
23-
allowPrivilegeEscalation: false
24-
command: ["/bin/bash", "-c"]
25-
args:
26-
- |
27-
set -e
28-
echo "Installing Hugging Face CLI..."
29-
pip install --no-cache-dir huggingface_hub[cli]
19+
- name: model-downloader
20+
image: python:3.11-slim
21+
securityContext:
22+
runAsNonRoot: false
23+
allowPrivilegeEscalation: false
24+
command: ["/bin/bash", "-c"]
25+
args:
26+
- |
27+
set -e
28+
echo "Installing Hugging Face CLI..."
29+
pip install --no-cache-dir huggingface_hub[cli]
3030
31-
echo "Downloading models to persistent volume..."
32-
cd /app/models
31+
echo "Downloading models to persistent volume..."
32+
cd /app/models
3333
34-
# Download category classifier model
35-
if [ ! -d "category_classifier_modernbert-base_model" ]; then
36-
echo "Downloading category classifier model..."
37-
huggingface-cli download LLM-Semantic-Router/category_classifier_modernbert-base_model --local-dir category_classifier_modernbert-base_model
38-
else
39-
echo "Category classifier model already exists, skipping..."
40-
fi
34+
# Download category classifier model
35+
if [ ! -d "category_classifier_modernbert-base_model" ]; then
36+
echo "Downloading category classifier model..."
37+
huggingface-cli download LLM-Semantic-Router/category_classifier_modernbert-base_model --local-dir category_classifier_modernbert-base_model
38+
else
39+
echo "Category classifier model already exists, skipping..."
40+
fi
4141
42-
# Download PII classifier model
43-
if [ ! -d "pii_classifier_modernbert-base_model" ]; then
44-
echo "Downloading PII classifier model..."
45-
huggingface-cli download LLM-Semantic-Router/pii_classifier_modernbert-base_model --local-dir pii_classifier_modernbert-base_model
46-
else
47-
echo "PII classifier model already exists, skipping..."
48-
fi
42+
# Download PII classifier model
43+
if [ ! -d "pii_classifier_modernbert-base_model" ]; then
44+
echo "Downloading PII classifier model..."
45+
huggingface-cli download LLM-Semantic-Router/pii_classifier_modernbert-base_model --local-dir pii_classifier_modernbert-base_model
46+
else
47+
echo "PII classifier model already exists, skipping..."
48+
fi
4949
50-
# Download jailbreak classifier model
51-
if [ ! -d "jailbreak_classifier_modernbert-base_model" ]; then
52-
echo "Downloading jailbreak classifier model..."
53-
huggingface-cli download LLM-Semantic-Router/jailbreak_classifier_modernbert-base_model --local-dir jailbreak_classifier_modernbert-base_model
54-
else
55-
echo "Jailbreak classifier model already exists, skipping..."
56-
fi
50+
# Download jailbreak classifier model
51+
if [ ! -d "jailbreak_classifier_modernbert-base_model" ]; then
52+
echo "Downloading jailbreak classifier model..."
53+
huggingface-cli download LLM-Semantic-Router/jailbreak_classifier_modernbert-base_model --local-dir jailbreak_classifier_modernbert-base_model
54+
else
55+
echo "Jailbreak classifier model already exists, skipping..."
56+
fi
5757
58-
# Download PII token classifier model
59-
if [ ! -d "pii_classifier_modernbert-base_presidio_token_model" ]; then
60-
echo "Downloading PII token classifier model..."
61-
huggingface-cli download LLM-Semantic-Router/pii_classifier_modernbert-base_presidio_token_model --local-dir pii_classifier_modernbert-base_presidio_token_model
62-
else
63-
echo "PII token classifier model already exists, skipping..."
64-
fi
58+
# Download PII token classifier model
59+
if [ ! -d "pii_classifier_modernbert-base_presidio_token_model" ]; then
60+
echo "Downloading PII token classifier model..."
61+
huggingface-cli download LLM-Semantic-Router/pii_classifier_modernbert-base_presidio_token_model --local-dir pii_classifier_modernbert-base_presidio_token_model
62+
else
63+
echo "PII token classifier model already exists, skipping..."
64+
fi
6565
66-
echo "All models downloaded successfully!"
67-
ls -la /app/models/
68-
env:
69-
- name: HF_HUB_CACHE
70-
value: /tmp/hf_cache
71-
# Reduced resource requirements for init container
72-
resources:
73-
requests:
74-
memory: "512Mi"
75-
cpu: "250m"
76-
limits:
77-
memory: "1Gi"
78-
cpu: "500m"
79-
volumeMounts:
80-
- name: models-volume
81-
mountPath: /app/models
66+
# Download embedding model all-MiniLM-L12-v2
67+
if [ ! -d "all-MiniLM-L12-v2" ]; then
68+
echo "Downloading all-MiniLM-L12-v2 embedding model..."
69+
huggingface-cli download sentence-transformers/all-MiniLM-L12-v2 --local-dir all-MiniLM-L12-v2
70+
else
71+
echo "all-MiniLM-L12-v2 already exists, skipping..."
72+
fi
73+
74+
75+
echo "Model setup complete."
76+
ls -la /app/models/
77+
env:
78+
- name: HF_HUB_CACHE
79+
value: /tmp/hf_cache
80+
# Reduced resource requirements for init container
81+
resources:
82+
requests:
83+
memory: "512Mi"
84+
cpu: "250m"
85+
limits:
86+
memory: "1Gi"
87+
cpu: "500m"
88+
volumeMounts:
89+
- name: models-volume
90+
mountPath: /app/models
8291
containers:
83-
- name: semantic-router
84-
image: ghcr.io/vllm-project/semantic-router/extproc:latest
85-
args: ["--secure=true"]
86-
securityContext:
87-
runAsNonRoot: false
88-
allowPrivilegeEscalation: false
89-
ports:
90-
- containerPort: 50051
91-
name: grpc
92-
protocol: TCP
93-
- containerPort: 9190
94-
name: metrics
95-
protocol: TCP
96-
- containerPort: 8080
97-
name: classify-api
98-
protocol: TCP
99-
env:
100-
- name: LD_LIBRARY_PATH
101-
value: "/app/lib"
102-
volumeMounts:
92+
- name: semantic-router
93+
image: ghcr.io/vllm-project/semantic-router/extproc:latest
94+
args: ["--secure=true"]
95+
securityContext:
96+
runAsNonRoot: false
97+
allowPrivilegeEscalation: false
98+
ports:
99+
- containerPort: 50051
100+
name: grpc
101+
protocol: TCP
102+
- containerPort: 9190
103+
name: metrics
104+
protocol: TCP
105+
- containerPort: 8080
106+
name: classify-api
107+
protocol: TCP
108+
env:
109+
- name: LD_LIBRARY_PATH
110+
value: "/app/lib"
111+
volumeMounts:
112+
- name: config-volume
113+
mountPath: /app/config
114+
readOnly: true
115+
- name: models-volume
116+
mountPath: /app/models
117+
livenessProbe:
118+
tcpSocket:
119+
port: 50051
120+
initialDelaySeconds: 60
121+
periodSeconds: 30
122+
timeoutSeconds: 10
123+
failureThreshold: 3
124+
readinessProbe:
125+
tcpSocket:
126+
port: 50051
127+
initialDelaySeconds: 90
128+
periodSeconds: 30
129+
timeoutSeconds: 10
130+
failureThreshold: 3
131+
# Significantly reduced resource requirements for kind cluster
132+
resources:
133+
requests:
134+
memory: "3Gi" # Reduced from 8Gi
135+
cpu: "1" # Reduced from 2
136+
limits:
137+
memory: "6Gi" # Reduced from 12Gi
138+
cpu: "2" # Reduced from 4
139+
volumes:
103140
- name: config-volume
104-
mountPath: /app/config
105-
readOnly: true
141+
configMap:
142+
name: semantic-router-config
106143
- name: models-volume
107-
mountPath: /app/models
108-
livenessProbe:
109-
tcpSocket:
110-
port: 50051
111-
initialDelaySeconds: 60
112-
periodSeconds: 30
113-
timeoutSeconds: 10
114-
failureThreshold: 3
115-
readinessProbe:
116-
tcpSocket:
117-
port: 50051
118-
initialDelaySeconds: 90
119-
periodSeconds: 30
120-
timeoutSeconds: 10
121-
failureThreshold: 3
122-
# Significantly reduced resource requirements for kind cluster
123-
resources:
124-
requests:
125-
memory: "3Gi" # Reduced from 8Gi
126-
cpu: "1" # Reduced from 2
127-
limits:
128-
memory: "6Gi" # Reduced from 12Gi
129-
cpu: "2" # Reduced from 4
130-
volumes:
131-
- name: config-volume
132-
configMap:
133-
name: semantic-router-config
134-
- name: models-volume
135-
persistentVolumeClaim:
136-
claimName: semantic-router-models
144+
persistentVolumeClaim:
145+
claimName: semantic-router-models
Lines changed: 2 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -1,25 +1,6 @@
11
apiVersion: kustomize.config.k8s.io/v1beta1
22
kind: Kustomization
33

4-
metadata:
5-
name: semantic-router
6-
4+
# This root points to the 'core' overlay by default for clarity.
75
resources:
8-
- namespace.yaml
9-
- pvc.yaml
10-
- deployment.with-llm-katan.yaml
11-
- service.yaml
12-
13-
# Generate ConfigMap
14-
configMapGenerator:
15-
- name: semantic-router-config
16-
files:
17-
- config.yaml
18-
- tools_db.json
19-
20-
# Namespace for all resources
21-
namespace: vllm-semantic-router-system
22-
23-
images:
24-
- name: ghcr.io/vllm-project/semantic-router/extproc
25-
newTag: latest
6+
- overlays/core
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
apiVersion: kustomize.config.k8s.io/v1beta1
2+
kind: Kustomization
3+
4+
resources:
5+
- ../../base
6+
- ../../deployment.yaml
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
apiVersion: kustomize.config.k8s.io/v1beta1
2+
kind: Kustomization
3+
4+
resources:
5+
- ../../base
6+
- ../../deployment.katan.yaml

0 commit comments

Comments
 (0)