GPUStack
This service deploys GPUStack, an open-source GPU cluster manager for running large language models (LLMs).
Services
gpustack: GPUStack server with built-in worker
Environment Variables
| Variable Name | Description | Default Value |
|---|---|---|
| GPUSTACK_VERSION | GPUStack image version | v0.5.3 |
| GPUSTACK_HOST | Host to bind the server to | 0.0.0.0 |
| GPUSTACK_PORT | Port to bind the server to | 80 |
| GPUSTACK_DEBUG | Enable debug mode | false |
| GPUSTACK_BOOTSTRAP_PASSWORD | Password for the bootstrap admin user | admin |
| GPUSTACK_TOKEN | Token for worker registration | (auto) |
| HF_TOKEN | Hugging Face token for model downloads | "" |
| GPUSTACK_PORT_OVERRIDE | Host port mapping | 80 |
Please modify the .env file as needed for your use case.
Volumes
gpustack_data: Data directory for GPUStack
GPU Support
NVIDIA GPU
Uncomment the GPU-related configuration in docker-compose.yaml:
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
runtime: nvidia
AMD GPU (ROCm)
Use the ROCm-specific image:
image: gpustack/gpustack:v0.5.3-rocm
Usage
Start GPUStack
docker compose up -d
Access
- Web UI: http://localhost:80
- Default credentials:
admin/admin(configured viaGPUSTACK_BOOTSTRAP_PASSWORD)
Deploy a Model
- Log in to the web UI
- Navigate to Models
- Click "Deploy Model"
- Select a model from the catalog or add a custom model
- Configure the model parameters
- Click "Deploy"
Add Worker Nodes
To add more GPU nodes to the cluster:
-
Get the registration token from the server:
docker exec gpustack cat /var/lib/gpustack/token -
Start a worker on another node:
docker run -d --name gpustack-worker \ --gpus all \ --network host \ --ipc host \ -v gpustack-data:/var/lib/gpustack \ gpustack/gpustack:v0.5.3 \ --server-url http://your-server-ip:80 \ --token YOUR_TOKEN
Features
- Model Management: Deploy and manage LLM models from Hugging Face, ModelScope, or custom sources
- GPU Scheduling: Automatic GPU allocation and scheduling
- Multi-Backend: Supports llama-box, vLLM, and other backends
- API Compatible: OpenAI-compatible API endpoint
- Web UI: User-friendly web interface for management
- Monitoring: Resource usage and model metrics
API Usage
GPUStack provides an OpenAI-compatible API:
curl http://localhost:80/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
"model": "llama-3.2-3b-instruct",
"messages": [{"role": "user", "content": "Hello!"}]
}'
Notes
- For production use, change the default password
- GPU support requires NVIDIA Docker runtime or AMD ROCm support
- Model downloads can be large (several GB), ensure sufficient disk space
- First model deployment may take time as it downloads the model files
Security
- Change default admin password after first login
- Use strong passwords for API keys
- Consider using TLS for production deployments
- Restrict network access to trusted sources
License
GPUStack is licensed under Apache License 2.0. See GPUStack GitHub for more information.