Async image-classification API for Scanning Electron Microscopy (SEM) images. The default service serves a Hugging Face ViT image classifier through BentoML, queues work in Redis, exposes stable REST endpoints through KrakenD, and records authenticated API usage in PostgreSQL.
The repository is intentionally structured so another image or non-LLM model can reuse the same Kubernetes, gateway, queue, and reporting layers with minimal changes.
Client
|
| JWT + REST
v
KrakenD gateway
|-- validates JWTs
|-- maps public /api/v1/* routes to BentoML
|-- records request usage in PostgreSQL
v
BentoML service
|-- accepts image upload or image_url
|-- stores async jobs in Redis
|-- runs SEM classifier inference in a worker thread
v
Redis queue/results
PostgreSQL stores request-level usage rows for admin reports.
mock-oidc is included for local/dev JWT testing.
All public routes are served through KrakenD on port 8080.
| Method | Endpoint | Auth | Description |
|---|---|---|---|
GET |
/__health |
No | KrakenD gateway health |
GET |
/health |
No | BentoML model and Redis health |
POST |
/api/v1/inference |
Yes | Submit an image inference job |
POST |
/api/v1/jobs/status |
Yes | Poll job status |
POST |
/api/v1/jobs/results |
Yes | Fetch completed job result |
GET |
/api/v1/version |
No | Static API version/discovery response |
Status and result routes use POST because BentoML API methods receive JSON bodies. The public API stays stable even if the internal BentoML service class changes.
sem-image-classifier-api/
├── Containerfile # Multi-stage CPU-only BentoML image build
├── README.md # Public project guide
├── LICENSE # EUPL-1.2 license text
├── NOTICE # Project attribution notice
├── pyproject.toml # Python project metadata and dependencies
├── db/schema.sql # PostgreSQL api_usage schema
├── docs/MODEL_MIGRATION.md # How to adapt this repo to another model
├── gateway/ # KrakenD flexible configuration and plugin
├── k8s/dev.sh # Build, deploy, access, and admin helper
├── k8s/env/ # Tracked dev/prod templates and ignored local overrides
├── k8s/manifests/ # Plain Kubernetes manifests in dependency order
├── scripts/usage_report.py # Terminal, JSON, and HTML usage reports
├── scripts/stress_test_api.py # Authenticated traffic generator
├── src/service.py # SEM-specific model loading and inference
├── src/image_service.py # Reusable image input layer
├── src/model_service.py # Reusable async queue/service foundation
├── src/redis_queue.py # Payload-agnostic Redis job queue
└── tests/test_api.py # End-to-end API smoke test
- Python 3.12
uvfor Python dependency managementpodmanfor image buildskubectlfor Kubernetes operations- A Linux development host with SSH access to the Stencil virtual datacenter and its K3s cluster nodes
- A container registry reachable by both the build host and Kubernetes nodes
⚠️ Warning (dev.sh is environment-specific):k8s/dev.shis designed around the author’s development environment and assumptions (SSH access to the K3s hosts, kubeconfig tunnel patching, registry reachability, and the tracked dev/prod env templates). It will not work out-of-the-box on other infrastructure.If your environment satisfies the requirements above, you can adapt it by creating your own
cluster.local.envand.local.*overrides and validating with./dev.sh config --env <env>before running any deploy/reset/bootstrap commands. Some commands (notablyresetandbootstrap) delete the configured namespace.
k8s/dev.sh is the namespace and deployment authority. It owns the service identity and operational flow: app name, namespace, image repository, image tag, BentoML service entrypoint, namespace reset guardrails, build, deploy, access, and reporting helpers.
The project uses one public identity everywhere:
APP_NAME=sem-image-classifier
SERVICE_NAME=sem-image-classifier
NAMESPACE=sem-image-classifier
IMAGE_REPOSITORY=sem-image-classifier
BENTOML_SERVICE=service:SEMInferenceRedisServiceConfiguration is split into three domains:
| File | Owns | Should not contain |
|---|---|---|
k8s/dev.sh |
Service identity, BentoML service entrypoint, build/deploy/reset/access behavior | Private IPs, credentials, model IDs |
k8s/env/<env>/bentoml-config*.yaml |
Redis connection, job TTL, model source, model ID, revision, cache directory | Cluster hostnames, registry, service identity |
k8s/env/<env>/cluster.env and ignored cluster.local.env |
K3s host/user/nodes, registry endpoint, remote kubeconfig path | App name, namespace, image repository, BentoML service, MODEL_* values |
Tracked cluster.env files provide example infrastructure defaults. For a real environment, create an ignored local infrastructure override:
# k8s/env/dev/cluster.local.env
K3S_API_HOST=your-k3s-host.example.org
K3S_SSH_USER=root
K3S_REMOTE_KUBECONFIG=/etc/rancher/k3s/k3s.yaml
K3S_NODES="your-k3s-host.example.org"
REGISTRY=registry.example.org:5000
REGISTRY_SCHEME=httpsUse ./dev.sh config --env dev to see the exact resolved files and non-secret values before running a build or namespace reset.
Tracked config files under k8s/env/<env>/ remain templates. Machine-specific overrides use .local.yaml, .local.json, or cluster.local.env; these are ignored by git. cluster.local.env is intentionally restricted to infrastructure keys so it cannot create a second service identity.
The container build supports two model sources:
MODEL_SOURCE |
Required values | Meaning |
|---|---|---|
hugging_face |
MODEL_ID, MODEL_REVISION |
Download a public Hugging Face model revision during image build and bake it into the image. |
private |
MODEL_ID, absolute MODEL_CACHE_DIR |
Copy a local Hugging Face cache root during image build and validate the requested model offline. |
Private mode treats MODEL_CACHE_DIR as a Hugging Face cache root, not as a single snapshot directory. If MODEL_REVISION is omitted and exactly one snapshot exists for MODEL_ID, the build resolves that snapshot. If multiple snapshots exist, set MODEL_REVISION explicitly.
Runtime loads from the baked image cache. MODEL_CACHE_DIR is build-only.
⚠️ Warning (dev.sh is environment-specific): The commands below assume an environment compatible withk8s/dev.sh. You will need to adaptk8s/env/<env>/cluster.local.env(plus the.local.*overrides) for your cluster/registry and confirm the resolved config with./dev.sh config --env <env>.
Set up local cluster values first:
cp k8s/env/dev/bentoml-config.yaml k8s/env/dev/bentoml-config.local.yaml
cp k8s/env/dev/gateway-settings.json k8s/env/dev/gateway-settings.local.json
cp k8s/env/dev/secrets.yaml k8s/env/dev/secrets.local.yaml
cat > k8s/env/dev/cluster.local.env <<'ENV'
K3S_API_HOST=your-k3s-host.example.org
K3S_SSH_USER=root
K3S_REMOTE_KUBECONFIG=/etc/rancher/k3s/k3s.yaml
K3S_NODES="your-k3s-host.example.org"
REGISTRY=registry.example.org:5000
REGISTRY_SCHEME=https
ENVThen build and deploy:
cd k8s
./dev.sh config --env dev
./dev.sh access --env dev
./dev.sh build-image --env dev
./dev.sh bootstrap --env dev
./dev.sh access --env devbootstrap builds and pushes the image first, writes a release artifact, then resets the configured namespace and deploys from the release artifact. Protected namespaces such as default, kube-system, and storage-system are refused.
Useful operations:
cd k8s
./dev.sh status
./dev.sh logs bentoml
./dev.sh logs krakend
./dev.sh restart bentoml --env dev
./dev.sh token testuserWith ./dev.sh access --env dev running local forwards:
TOKEN=$(cd k8s && ./dev.sh token testuser)
curl -s http://localhost:8080/health | python3 -m json.tool
curl -s -X POST http://localhost:8080/api/v1/inference \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{"image_url":"https://example.org/path/to/sem-image.jpg"}' \
| python3 -m json.tool
curl -s -X POST http://localhost:8080/api/v1/jobs/status \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{"job_id":"<job-id>"}' \
| python3 -m json.tool
curl -s -X POST http://localhost:8080/api/v1/jobs/results \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{"job_id":"<job-id>"}' \
| python3 -m json.toolThe service accepts either image_url JSON input or multipart file upload with field name image.
scripts/usage_report.py reads the api_usage table from PostgreSQL through kubectl exec and produces admin-friendly usage views.
Modes:
| Mode | Purpose |
|---|---|
summary |
Fast terminal inspection. Defaults to the last 24 hours. |
report |
Static self-contained HTML dashboard. Defaults to all retained database history. |
json |
Machine-readable support output for automation. |
Common flags:
| Flag | Meaning |
|---|---|
--namespace |
Kubernetes namespace containing postgresql-0. Defaults to sem-image-classifier. |
--since |
Start time, ISO timestamp, or relative window such as 24h, 7d, 30d. |
--until |
End time, ISO timestamp, relative window, or now. |
--timezone |
Display timezone for buckets and recent rows. Defaults to UTC. |
--bucket |
auto, minute, hour, or day. Defaults to auto. |
--recent-limit |
Number of recent rows to include. |
--output |
Destination for report or json. |
Examples:
# Quick operational check for the last day
python scripts/usage_report.py summary --namespace sem-image-classifier --since 24h
# Full retained-history HTML dashboard
python scripts/usage_report.py report \
--namespace sem-image-classifier \
--output /tmp/sem-usage-report.html
# Machine-readable output for another tool
python scripts/usage_report.py json \
--namespace sem-image-classifier \
--since 7d \
--output /tmp/sem-usage.jsonThe HTML report is a single file with inline CSS/SVG and no JavaScript dependency. It includes coverage, KPI cards, traffic timeline, endpoint mix, status health, user leaderboard, hourly heatmap, recent requests, and data-quality notes.
Use the stress script to seed usage data and test the gateway path:
python scripts/stress_test_api.py \
--base-url http://localhost:8080 \
--mock-token-url http://localhost:18080/default/token \
--users alice,bob,charlie \
--requests 30 \
--concurrency 5 \
--mode mixed \
--pollStart with docs/MODEL_MIGRATION.md. The short version:
- Keep
src/model_service.py,src/redis_queue.py, gateway config, and Kubernetes manifests unless the serving pattern changes. - Replace or edit
src/service.pywith the new model loading, preprocessing, inference, and result schema. - Reuse
src/image_service.pyfor image models, or create a sibling input layer for another payload type. - Edit
BENTOML_SERVICEink8s/dev.shonly if the BentoML entrypoint changes. - Update model env values in
k8s/env/<env>/bentoml-config*.yaml. - Run the build, smoke test, stress test, and usage reports.
The default image uses CPU-only PyTorch wheels to keep the deployment smaller and simpler. This is the right default for single-image SEM inference. GPU support should be treated as a separate deployment design because it changes node prerequisites, image base, dependency resolution, scheduling, and resource limits.
This project was developed as part of my master's thesis "Building Service Layers in the NFFA-DI Digital Ecosystem: Governed Bucket Management and Reusable Analysis Services", submitted in fulfilment of the requirements for the Master in Data Management and Curation (MDMC) at SISSA / AREA Science Park (Trieste, Italy).
I thank my supervisor Dr. Tommaso Rodani (tommaso.rodani@areasciencepark.it) for his guidance and support throughout this work.
Copyright is held by AREA Science Park. The author is Luis Fernando Palacios Flores.
Licensed under the European Union Public Licence, version 1.2 or later (EUPL-1.2-or-later). See LICENSE and NOTICE.