feat: add mcp-servers/**
This commit is contained in:
@@ -1,19 +1,22 @@
|
||||
# GPUStack version
|
||||
GPUSTACK_VERSION="v0.5.3"
|
||||
GPUSTACK_VERSION=v0.7.1
|
||||
|
||||
# Timezone setting
|
||||
TZ=UTC
|
||||
|
||||
# Server configuration
|
||||
GPUSTACK_HOST="0.0.0.0"
|
||||
GPUSTACK_HOST=0.0.0.0
|
||||
GPUSTACK_PORT=80
|
||||
GPUSTACK_DEBUG=false
|
||||
|
||||
# Admin bootstrap password
|
||||
GPUSTACK_BOOTSTRAP_PASSWORD="admin"
|
||||
GPUSTACK_BOOTSTRAP_PASSWORD=admin
|
||||
|
||||
# Token for worker registration (auto-generated if not set)
|
||||
GPUSTACK_TOKEN=""
|
||||
GPUSTACK_TOKEN=
|
||||
|
||||
# Hugging Face token for model downloads
|
||||
HF_TOKEN=""
|
||||
HF_TOKEN=
|
||||
|
||||
# Port to bind to on the host machine
|
||||
GPUSTACK_PORT_OVERRIDE=80
|
||||
|
||||
@@ -2,26 +2,39 @@
|
||||
|
||||
[English](./README.md) | [中文](./README.zh.md)
|
||||
|
||||
This service deploys GPUStack, an open-source GPU cluster manager for running large language models (LLMs).
|
||||
GPUStack is an open-source GPU cluster manager for running and scaling large language models (LLMs).
|
||||
|
||||
## Quick Start
|
||||
|
||||
```bash
|
||||
docker compose up -d
|
||||
```
|
||||
|
||||
Access the web UI at <http://localhost:80> with default credentials `admin` / `admin`.
|
||||
|
||||
## Services
|
||||
|
||||
- `gpustack`: GPUStack server with built-in worker
|
||||
- `gpustack`: GPUStack server with GPU support enabled by default
|
||||
|
||||
## Ports
|
||||
|
||||
| Service | Port |
|
||||
| -------- | ---- |
|
||||
| gpustack | 80 |
|
||||
|
||||
## Environment Variables
|
||||
|
||||
| Variable Name | Description | Default Value |
|
||||
| --------------------------- | -------------------------------------- | ------------- |
|
||||
| GPUSTACK_VERSION | GPUStack image version | `v0.5.3` |
|
||||
| GPUSTACK_HOST | Host to bind the server to | `0.0.0.0` |
|
||||
| GPUSTACK_PORT | Port to bind the server to | `80` |
|
||||
| GPUSTACK_DEBUG | Enable debug mode | `false` |
|
||||
| GPUSTACK_BOOTSTRAP_PASSWORD | Password for the bootstrap admin user | `admin` |
|
||||
| GPUSTACK_TOKEN | Token for worker registration | (auto) |
|
||||
| HF_TOKEN | Hugging Face token for model downloads | `""` |
|
||||
| GPUSTACK_PORT_OVERRIDE | Host port mapping | `80` |
|
||||
|
||||
Please modify the `.env` file as needed for your use case.
|
||||
| Variable | Description | Default |
|
||||
| --------------------------- | -------------------------------------- | --------- |
|
||||
| GPUSTACK_VERSION | GPUStack image version | `v0.7.1` |
|
||||
| TZ | Timezone setting | `UTC` |
|
||||
| GPUSTACK_HOST | Host to bind the server to | `0.0.0.0` |
|
||||
| GPUSTACK_PORT | Port to bind the server to | `80` |
|
||||
| GPUSTACK_DEBUG | Enable debug mode | `false` |
|
||||
| GPUSTACK_BOOTSTRAP_PASSWORD | Password for the bootstrap admin user | `admin` |
|
||||
| GPUSTACK_TOKEN | Token for worker registration | (auto) |
|
||||
| HF_TOKEN | Hugging Face token for model downloads | (empty) |
|
||||
| GPUSTACK_PORT_OVERRIDE | Host port mapping | `80` |
|
||||
|
||||
## Volumes
|
||||
|
||||
@@ -29,84 +42,79 @@ Please modify the `.env` file as needed for your use case.
|
||||
|
||||
## GPU Support
|
||||
|
||||
### NVIDIA GPU
|
||||
|
||||
Uncomment the GPU-related configuration in `docker-compose.yaml`:
|
||||
This service is configured with NVIDIA GPU support enabled by default. The configuration uses:
|
||||
|
||||
```yaml
|
||||
deploy:
|
||||
resources:
|
||||
reservations:
|
||||
devices:
|
||||
- driver: nvidia
|
||||
count: 1
|
||||
capabilities: [gpu]
|
||||
runtime: nvidia
|
||||
deploy:
|
||||
resources:
|
||||
reservations:
|
||||
devices:
|
||||
- driver: nvidia
|
||||
device_ids: [ '0' ]
|
||||
capabilities: [ gpu ]
|
||||
```
|
||||
|
||||
### Requirements
|
||||
|
||||
- NVIDIA GPU with CUDA support
|
||||
- NVIDIA Container Toolkit installed on the host
|
||||
- Docker 19.03+ with GPU support
|
||||
|
||||
### AMD GPU (ROCm)
|
||||
|
||||
Use the ROCm-specific image:
|
||||
To use AMD GPUs with ROCm support:
|
||||
|
||||
```yaml
|
||||
image: gpustack/gpustack:v0.5.3-rocm
|
||||
```
|
||||
1. Use the ROCm-specific image in `docker-compose.yaml`:
|
||||
|
||||
```yaml
|
||||
image: gpustack/gpustack:${GPUSTACK_VERSION:-v0.7.1}-rocm
|
||||
```
|
||||
|
||||
2. Change the device driver to `amdgpu`:
|
||||
|
||||
```yaml
|
||||
deploy:
|
||||
resources:
|
||||
reservations:
|
||||
devices:
|
||||
- driver: amdgpu
|
||||
device_ids: [ '0' ]
|
||||
capabilities: [ gpu ]
|
||||
```
|
||||
|
||||
## Usage
|
||||
|
||||
### Start GPUStack
|
||||
|
||||
```bash
|
||||
docker compose up -d
|
||||
```
|
||||
|
||||
### Access
|
||||
|
||||
- Web UI: <http://localhost:80>
|
||||
- Default credentials: `admin` / `admin` (configured via `GPUSTACK_BOOTSTRAP_PASSWORD`)
|
||||
|
||||
### Deploy a Model
|
||||
|
||||
1. Log in to the web UI
|
||||
2. Navigate to Models
|
||||
3. Click "Deploy Model"
|
||||
4. Select a model from the catalog or add a custom model
|
||||
5. Configure the model parameters
|
||||
6. Click "Deploy"
|
||||
1. Log in to the web UI at <http://localhost:80>
|
||||
2. Navigate to **Models** → **Deploy Model**
|
||||
3. Select a model from the catalog or add a custom model
|
||||
4. Configure the model parameters
|
||||
5. Click **Deploy**
|
||||
|
||||
### Add Worker Nodes
|
||||
|
||||
To add more GPU nodes to the cluster:
|
||||
To scale your cluster by adding more GPU nodes:
|
||||
|
||||
1. Get the registration token from the server:
|
||||
|
||||
```bash
|
||||
docker exec gpustack cat /var/lib/gpustack/token
|
||||
```
|
||||
```bash
|
||||
docker exec gpustack gpustack show-token
|
||||
```
|
||||
|
||||
2. Start a worker on another node:
|
||||
|
||||
```bash
|
||||
docker run -d --name gpustack-worker \
|
||||
--gpus all \
|
||||
--network host \
|
||||
--ipc host \
|
||||
-v gpustack-data:/var/lib/gpustack \
|
||||
gpustack/gpustack:v0.5.3 \
|
||||
--server-url http://your-server-ip:80 \
|
||||
--token YOUR_TOKEN
|
||||
```
|
||||
```bash
|
||||
docker run -d --name gpustack-worker \
|
||||
--gpus all \
|
||||
--network host \
|
||||
--ipc host \
|
||||
-v gpustack-worker-data:/var/lib/gpustack \
|
||||
gpustack/gpustack:v0.7.1 \
|
||||
gpustack start --server-url http://your-server-ip:80 --token YOUR_TOKEN
|
||||
```
|
||||
|
||||
## Features
|
||||
|
||||
- **Model Management**: Deploy and manage LLM models from Hugging Face, ModelScope, or custom sources
|
||||
- **GPU Scheduling**: Automatic GPU allocation and scheduling
|
||||
- **Multi-Backend**: Supports llama-box, vLLM, and other backends
|
||||
- **API Compatible**: OpenAI-compatible API endpoint
|
||||
- **Web UI**: User-friendly web interface for management
|
||||
- **Monitoring**: Resource usage and model metrics
|
||||
|
||||
## API Usage
|
||||
### API Usage
|
||||
|
||||
GPUStack provides an OpenAI-compatible API:
|
||||
|
||||
@@ -120,19 +128,31 @@ curl http://localhost:80/v1/chat/completions \
|
||||
}'
|
||||
```
|
||||
|
||||
## Features
|
||||
|
||||
- **Model Management**: Deploy and manage LLM models from Hugging Face, ModelScope, or custom sources
|
||||
- **GPU Scheduling**: Automatic GPU allocation and load balancing
|
||||
- **Multi-Backend**: Supports llama-box, vLLM, and other inference backends
|
||||
- **OpenAI-Compatible API**: Drop-in replacement for OpenAI API
|
||||
- **Web UI**: User-friendly web interface for cluster management
|
||||
- **Monitoring**: Real-time resource usage and model performance metrics
|
||||
- **Multi-Node**: Scale across multiple GPU servers
|
||||
|
||||
## Notes
|
||||
|
||||
- For production use, change the default password
|
||||
- GPU support requires NVIDIA Docker runtime or AMD ROCm support
|
||||
- Model downloads can be large (several GB), ensure sufficient disk space
|
||||
- First model deployment may take time as it downloads the model files
|
||||
- **Production Security**: Change the default `GPUSTACK_BOOTSTRAP_PASSWORD` before deploying
|
||||
- **GPU Requirements**: NVIDIA GPU with CUDA support is required; ensure NVIDIA Container Toolkit is installed
|
||||
- **Disk Space**: Model downloads can be several gigabytes; ensure sufficient storage
|
||||
- **First Deployment**: Initial model deployment may take time as it downloads model files
|
||||
- **Network**: By default, the service binds to all interfaces (`0.0.0.0`); restrict access in production
|
||||
|
||||
## Security
|
||||
|
||||
- Change default admin password after first login
|
||||
- Use strong passwords for API keys
|
||||
- Consider using TLS for production deployments
|
||||
- Restrict network access to trusted sources
|
||||
- **Change Default Password**: Update `GPUSTACK_BOOTSTRAP_PASSWORD` after first login
|
||||
- **API Keys**: Use strong, unique API keys for accessing the API
|
||||
- **TLS/HTTPS**: Consider using a reverse proxy with TLS for production
|
||||
- **Network Access**: Restrict access to trusted networks using firewalls
|
||||
- **Updates**: Keep GPUStack updated to the latest stable version
|
||||
|
||||
## License
|
||||
|
||||
|
||||
159
src/gpustack/README.zh.md
Normal file
159
src/gpustack/README.zh.md
Normal file
@@ -0,0 +1,159 @@
|
||||
# GPUStack
|
||||
|
||||
[English](./README.md) | [中文](./README.zh.md)
|
||||
|
||||
GPUStack 是一个开源的 GPU 集群管理器,用于运行和扩展大型语言模型(LLM)。
|
||||
|
||||
## 快速开始
|
||||
|
||||
```bash
|
||||
docker compose up -d
|
||||
```
|
||||
|
||||
在 <http://localhost:80> 访问 Web UI,默认凭据为 `admin` / `admin`。
|
||||
|
||||
## 服务
|
||||
|
||||
- `gpustack`:默认启用 GPU 支持的 GPUStack 服务器
|
||||
|
||||
## 端口
|
||||
|
||||
| 服务 | 端口 |
|
||||
| -------- | ---- |
|
||||
| gpustack | 80 |
|
||||
|
||||
## 环境变量
|
||||
|
||||
| 变量名 | 描述 | 默认值 |
|
||||
| --------------------------- | ------------------------- | --------- |
|
||||
| GPUSTACK_VERSION | GPUStack 镜像版本 | `v0.7.1` |
|
||||
| TZ | 时区设置 | `UTC` |
|
||||
| GPUSTACK_HOST | 服务器绑定的主机地址 | `0.0.0.0` |
|
||||
| GPUSTACK_PORT | 服务器绑定的端口 | `80` |
|
||||
| GPUSTACK_DEBUG | 启用调试模式 | `false` |
|
||||
| GPUSTACK_BOOTSTRAP_PASSWORD | 引导管理员用户的密码 | `admin` |
|
||||
| GPUSTACK_TOKEN | Worker 注册令牌 | (自动) |
|
||||
| HF_TOKEN | Hugging Face 模型下载令牌 | (空) |
|
||||
| GPUSTACK_PORT_OVERRIDE | 主机端口映射 | `80` |
|
||||
|
||||
## 卷
|
||||
|
||||
- `gpustack_data`:GPUStack 数据目录
|
||||
|
||||
## GPU 支持
|
||||
|
||||
本服务默认配置了 NVIDIA GPU 支持。配置使用:
|
||||
|
||||
```yaml
|
||||
deploy:
|
||||
resources:
|
||||
reservations:
|
||||
devices:
|
||||
- driver: nvidia
|
||||
device_ids: [ '0' ]
|
||||
capabilities: [ gpu ]
|
||||
```
|
||||
|
||||
### 要求
|
||||
|
||||
- 支持 CUDA 的 NVIDIA GPU
|
||||
- 主机上安装了 NVIDIA Container Toolkit
|
||||
- Docker 19.03+ 支持 GPU
|
||||
|
||||
### AMD GPU(ROCm)
|
||||
|
||||
要使用支持 ROCm 的 AMD GPU:
|
||||
|
||||
1. 在 `docker-compose.yaml` 中使用 ROCm 特定镜像:
|
||||
|
||||
```yaml
|
||||
image: gpustack/gpustack:${GPUSTACK_VERSION:-v0.7.1}-rocm
|
||||
```
|
||||
|
||||
2. 将设备驱动更改为 `amdgpu`:
|
||||
|
||||
```yaml
|
||||
deploy:
|
||||
resources:
|
||||
reservations:
|
||||
devices:
|
||||
- driver: amdgpu
|
||||
device_ids: [ '0' ]
|
||||
capabilities: [ gpu ]
|
||||
```
|
||||
|
||||
## 使用方法
|
||||
|
||||
### 部署模型
|
||||
|
||||
1. 在 <http://localhost:80> 登录 Web UI
|
||||
2. 导航到 **Models** → **Deploy Model**
|
||||
3. 从目录中选择模型或添加自定义模型
|
||||
4. 配置模型参数
|
||||
5. 点击 **Deploy**
|
||||
|
||||
### 添加 Worker 节点
|
||||
|
||||
通过添加更多 GPU 节点来扩展集群:
|
||||
|
||||
1. 从服务器获取注册令牌:
|
||||
|
||||
```bash
|
||||
docker exec gpustack gpustack show-token
|
||||
```
|
||||
|
||||
2. 在另一个节点上启动 Worker:
|
||||
|
||||
```bash
|
||||
docker run -d --name gpustack-worker \
|
||||
--gpus all \
|
||||
--network host \
|
||||
--ipc host \
|
||||
-v gpustack-worker-data:/var/lib/gpustack \
|
||||
gpustack/gpustack:v0.7.1 \
|
||||
gpustack start --server-url http://your-server-ip:80 --token YOUR_TOKEN
|
||||
```
|
||||
|
||||
### API 使用
|
||||
|
||||
GPUStack 提供与 OpenAI 兼容的 API:
|
||||
|
||||
```bash
|
||||
curl http://localhost:80/v1/chat/completions \
|
||||
-H "Content-Type: application/json" \
|
||||
-H "Authorization: Bearer YOUR_API_KEY" \
|
||||
-d '{
|
||||
"model": "llama-3.2-3b-instruct",
|
||||
"messages": [{"role": "user", "content": "Hello!"}]
|
||||
}'
|
||||
```
|
||||
|
||||
## 功能特性
|
||||
|
||||
- **模型管理**:从 Hugging Face、ModelScope 或自定义源部署和管理 LLM 模型
|
||||
- **GPU 调度**:自动 GPU 分配和负载均衡
|
||||
- **多后端支持**:支持 llama-box、vLLM 和其他推理后端
|
||||
- **OpenAI 兼容 API**:可直接替代 OpenAI API
|
||||
- **Web UI**:用户友好的 Web 界面,用于集群管理
|
||||
- **监控**:实时资源使用和模型性能指标
|
||||
- **多节点**:可跨多个 GPU 服务器扩展
|
||||
|
||||
## 注意事项
|
||||
|
||||
- **生产环境安全**:部署前请更改默认的 `GPUSTACK_BOOTSTRAP_PASSWORD`
|
||||
- **GPU 要求**:需要支持 CUDA 的 NVIDIA GPU;确保已安装 NVIDIA Container Toolkit
|
||||
- **磁盘空间**:模型下载可能有数 GB;确保有足够的存储空间
|
||||
- **首次部署**:初次部署模型可能需要时间来下载模型文件
|
||||
- **网络**:默认情况下,服务绑定到所有接口(`0.0.0.0`);在生产环境中请限制访问
|
||||
|
||||
## 安全
|
||||
|
||||
- **更改默认密码**:首次登录后更新 `GPUSTACK_BOOTSTRAP_PASSWORD`
|
||||
- **API 密钥**:使用强且唯一的 API 密钥访问 API
|
||||
- **TLS/HTTPS**:在生产环境中考虑使用带 TLS 的反向代理
|
||||
- **网络访问**:使用防火墙将访问限制在受信任的网络
|
||||
- **更新**:保持 GPUStack 更新到最新稳定版本
|
||||
|
||||
## 许可证
|
||||
|
||||
GPUStack 采用 Apache License 2.0 许可。更多信息请参见 [GPUStack GitHub](https://github.com/gpustack/gpustack)。
|
||||
@@ -9,7 +9,7 @@ x-default: &default
|
||||
services:
|
||||
gpustack:
|
||||
<<: *default
|
||||
image: gpustack/gpustack:${GPUSTACK_VERSION:-v0.5.3}
|
||||
image: gpustack/gpustack:${GPUSTACK_VERSION:-v0.7.1}
|
||||
ports:
|
||||
- "${GPUSTACK_PORT_OVERRIDE:-80}:80"
|
||||
volumes:
|
||||
@@ -22,21 +22,19 @@ services:
|
||||
- GPUSTACK_TOKEN=${GPUSTACK_TOKEN:-}
|
||||
- GPUSTACK_BOOTSTRAP_PASSWORD=${GPUSTACK_BOOTSTRAP_PASSWORD:-admin}
|
||||
- HF_TOKEN=${HF_TOKEN:-}
|
||||
ipc: host
|
||||
deploy:
|
||||
resources:
|
||||
limits:
|
||||
cpus: '8.0'
|
||||
memory: 8G
|
||||
reservations:
|
||||
cpus: '2.0'
|
||||
memory: 4G
|
||||
reservations:
|
||||
cpus: '1.0'
|
||||
memory: 2G
|
||||
# Uncomment below for GPU support
|
||||
# devices:
|
||||
# - driver: nvidia
|
||||
# count: 1
|
||||
# capabilities: [gpu]
|
||||
# For GPU support, uncomment the following section
|
||||
# runtime: nvidia
|
||||
devices:
|
||||
- driver: nvidia
|
||||
device_ids: [ '0' ]
|
||||
capabilities: [ gpu ]
|
||||
healthcheck:
|
||||
test: ["CMD", "wget", "--no-verbose", "--tries=1", "--spider", "http://localhost:80/health"]
|
||||
interval: 30s
|
||||
|
||||
Reference in New Issue
Block a user