Variable	Description	Default
GPUSTACK_VERSION	GPUStack image version	`v0.7.1`
TZ	Timezone setting	`UTC`
GPUSTACK_HOST	Host to bind the server to	`0.0.0.0`
GPUSTACK_PORT	Port to bind the server to	`80`
GPUSTACK_DEBUG	Enable debug mode	`false`
GPUSTACK_BOOTSTRAP_PASSWORD	Password for the bootstrap admin user	`admin`
GPUSTACK_TOKEN	Token for worker registration	(auto)
HF_TOKEN	Hugging Face token for model downloads	(empty)
GPUSTACK_PORT_OVERRIDE	Host port mapping	`80`

Volumes

gpustack_data: Data directory for GPUStack

GPU Support

This service is configured with NVIDIA GPU support enabled by default. The configuration uses:

deploy:
  resources:
    reservations:
      devices:
        - driver: nvidia
          device_ids: [ '0' ]
          capabilities: [ gpu ]

Requirements

NVIDIA GPU with CUDA support
NVIDIA Container Toolkit installed on the host
Docker 19.03+ with GPU support

AMD GPU (ROCm)

To use AMD GPUs with ROCm support:

Use the ROCm-specific image in docker-compose.yaml:

image: ${GLOBAL_REGISTRY:-}gpustack/gpustack:${GPUSTACK_VERSION:-v0.7.1}-rocm

Change the device driver to amdgpu:

deploy:
  resources:
    reservations:
      devices:
        - driver: amdgpu
          device_ids: [ '0' ]
          capabilities: [ gpu ]

Usage

Deploy a Model

Log in to the web UI at http://localhost:80
Navigate to Models → Deploy Model
Select a model from the catalog or add a custom model
Configure the model parameters
Click Deploy

Add Worker Nodes

To scale your cluster by adding more GPU nodes:

Get the registration token from the server:

docker exec gpustack gpustack show-token

Start a worker on another node:

docker run -d --name gpustack-worker \
  --gpus all \
  --network host \
  --ipc host \
  -v gpustack-worker-data:/var/lib/gpustack \
  gpustack/gpustack:v0.7.1 \
  gpustack start --server-url http://your-server-ip:80 --token YOUR_TOKEN

API Usage

GPUStack provides an OpenAI-compatible API:

curl http://localhost:80/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "model": "llama-3.2-3b-instruct",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Features

Model Management: Deploy and manage LLM models from Hugging Face, ModelScope, or custom sources
GPU Scheduling: Automatic GPU allocation and load balancing
Multi-Backend: Supports llama-box, vLLM, and other inference backends
OpenAI-Compatible API: Drop-in replacement for OpenAI API
Web UI: User-friendly web interface for cluster management
Monitoring: Real-time resource usage and model performance metrics
Multi-Node: Scale across multiple GPU servers

Notes

Production Security: Change the default GPUSTACK_BOOTSTRAP_PASSWORD before deploying
GPU Requirements: NVIDIA GPU with CUDA support is required; ensure NVIDIA Container Toolkit is installed
Disk Space: Model downloads can be several gigabytes; ensure sufficient storage
First Deployment: Initial model deployment may take time as it downloads model files
Network: By default, the service binds to all interfaces (0.0.0.0); restrict access in production

Security

Change Default Password: Update GPUSTACK_BOOTSTRAP_PASSWORD after first login
API Keys: Use strong, unique API keys for accessing the API
TLS/HTTPS: Consider using a reverse proxy with TLS for production
Network Access: Restrict access to trusted networks using firewalls
Updates: Keep GPUStack updated to the latest stable version

License

GPUStack is licensed under Apache License 2.0. See GPUStack GitHub for more information.