Simple shell scripts for managing VastAI GPU instances with vLLM server deployment.
- VastAI CLI installed (
vastai) .envfile with required tokens (see below)
Copy the template and fill in your API keys:
cp .env.template .envThen edit .env with your actual keys:
- VAST_API_KEY: Get from https://cloud.vast.ai/api/
- HUGGING_FACE_HUB_TOKEN: Get from https://huggingface.co/settings/tokens
./check_balance.shShows account information, credit balance, and recent billing history.
./query_gpus.shLists available RTX 5090 single GPU servers sorted by price (lowest first).
./start_llm_instance.sh <offer_id>Creates a new instance with vLLM server running Gemma-3-27b model on port 8080.
./start_minimal_instance.sh <offer_id>Launches a lightweight Ubuntu 22.04 environment with only the host NVIDIA drivers available (no CUDA toolkit or vLLM setup). Perfect for custom runtimes or manual installs.
./list_instances.shShows all your running instances with status and connection info.
# 1. Check your account balance
./check_balance.sh
# 2. Find available GPU offers
./query_gpus.sh
# 3. Create instance from an offer (use ID from step 2)
# LLM-ready environment
./start_llm_instance.sh 26128186
# Minimal barebones environment
./start_minimal_instance.sh 26128186
# 4. Monitor your instances
./list_instances.sh
# 5. Connect to vLLM server
# Once running, the server will be available at:
# http://<instance_ip>:8080The scripts automatically deploy a vLLM OpenAI-compatible API server with:
- Model: ISTA-DASLab/gemma-3-27b-it-GPTQ-4b-128g
- Port: 8080
- API: OpenAI-compatible endpoints
- Max Context: 32,768 tokens
Use VastAI CLI commands for additional management:
# SSH into instance
vastai ssh-url <instance_id>
# Check logs
vastai logs <instance_id>
# Stop instance
vastai stop instance <instance_id>
# Delete instance
vastai destroy instance <instance_id>