GitHub - sisl/aquila: Manager for LLM deployment on local clusters using VLLM

Admin dashboard + satellite clients for multi-model vLLM deployments. Deploy vLLM serve endpoints across a cluster with a few clicks — ideal for research labs or small teams that need repeatable, multi-endpoint serving without a full MLOps stack.

Key features

Deploy and manage models across GPU nodes via Docker or rootless Podman.
OpenAI-compatible gateway (/v1) with stable URLs across node moves, API key auth, and per-deployment scoping.
Usage metrics, reproducibility manifests, Slack/webhook notifications, and log streaming.
Warm cache (pause/resume models between GPU and RAM), per-GPU maintenance mode, and live cluster settings.
Upload local checkpoints and LoRA adapters from the browser, or pull them from a URL.

See the full documentation for detailed guides.

Supported hardware

GPUs: NVIDIA H100, A100, L40, DGX Spark (GB10), RTX 4090
OS: Ubuntu 22.04 and 24.04

Prerequisites

Host: Docker + Compose, Node.js ≥ 23 + npm, Python 3.10–3.14, uv.

Client: NVIDIA GPU with driver, Docker or Podman ≥ 5.4, NVIDIA Container Toolkit, Python 3.10–3.14, uv.

Quick start

Install:

uv venv && source .venv/bin/activate
uv pip install aquila

Start the host:

aquila host up --host-ip 0.0.0.0 --host-frontend-port 5173 --host-discover-port 11400

Add a client node:

aquila client up --host-ip <host-ip> --host-discover-port 11400

Open http://<host-ip>:5173 — the client node appears within seconds. Add --service for persistent systemd services.

Gateway usage

Every deployment is reachable through a single gateway URL:

from openai import OpenAI

client = OpenAI(base_url="http://my-host:5173/v1", api_key="vcm-...")
resp = client.chat.completions.create(
    model="meta-llama/Llama-3.1-8B-Instruct",
    messages=[{"role": "user", "content": "Hello"}],
)

Name		Name	Last commit message	Last commit date
Latest commit History 148 Commits
.github/workflows		.github/workflows
aquila		aquila
docs		docs
img		img
tests		tests
.env.example		.env.example
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
MANIFEST.in		MANIFEST.in
README.md		README.md
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Key features

Supported hardware

Prerequisites

Quick start

Gateway usage

About

Uh oh!

Releases 16

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Key features

Supported hardware

Prerequisites

Quick start

Gateway usage

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 16

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages