Variable Name	Description	Default Value
GPUSTACK_VERSION	GPUStack image version	`v0.5.3`
GPUSTACK_HOST	Host to bind the server to	`0.0.0.0`
GPUSTACK_PORT	Port to bind the server to	`80`
GPUSTACK_DEBUG	Enable debug mode	`false`
GPUSTACK_BOOTSTRAP_PASSWORD	Password for the bootstrap admin user	`admin`
GPUSTACK_TOKEN	Token for worker registration	(auto)
HF_TOKEN	Hugging Face token for model downloads	`""`
GPUSTACK_PORT_OVERRIDE	Host port mapping	`80`

Please modify the .env file as needed for your use case.

Volumes

gpustack_data: Data directory for GPUStack

GPU Support

NVIDIA GPU

Uncomment the GPU-related configuration in docker-compose.yaml:

    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]
    runtime: nvidia

AMD GPU (ROCm)

Use the ROCm-specific image:

image: gpustack/gpustack:v0.5.3-rocm

Usage

Start GPUStack

docker compose up -d

Access

Web UI: http://localhost:80
Default credentials: admin / admin (configured via GPUSTACK_BOOTSTRAP_PASSWORD)

Deploy a Model

Log in to the web UI
Navigate to Models
Click "Deploy Model"
Select a model from the catalog or add a custom model
Configure the model parameters
Click "Deploy"

Add Worker Nodes

To add more GPU nodes to the cluster:

Get the registration token from the server:

docker exec gpustack cat /var/lib/gpustack/token

Start a worker on another node:

docker run -d --name gpustack-worker \
  --gpus all \
  --network host \
  --ipc host \
  -v gpustack-data:/var/lib/gpustack \
  gpustack/gpustack:v0.5.3 \
  --server-url http://your-server-ip:80 \
  --token YOUR_TOKEN

Features

Model Management: Deploy and manage LLM models from Hugging Face, ModelScope, or custom sources
GPU Scheduling: Automatic GPU allocation and scheduling
Multi-Backend: Supports llama-box, vLLM, and other backends
API Compatible: OpenAI-compatible API endpoint
Web UI: User-friendly web interface for management
Monitoring: Resource usage and model metrics

API Usage

GPUStack provides an OpenAI-compatible API:

curl http://localhost:80/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "model": "llama-3.2-3b-instruct",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Notes

For production use, change the default password
GPU support requires NVIDIA Docker runtime or AMD ROCm support
Model downloads can be large (several GB), ensure sufficient disk space
First model deployment may take time as it downloads the model files

Security

Change default admin password after first login
Use strong passwords for API keys
Consider using TLS for production deployments
Restrict network access to trusted sources

License

GPUStack is licensed under Apache License 2.0. See GPUStack GitHub for more information.