feat: add mcp-servers/**

2025-10-23 09:08:07 +08:00
parent ece59b42bf
commit f603ed5db9
57 changed files with 3061 additions and 95 deletions
--- a/src/gpustack/.env.example
+++ b/src/gpustack/.env.example
@@ -1,19 +1,22 @@
 # GPUStack version
-GPUSTACK_VERSION="v0.5.3"
+GPUSTACK_VERSION=v0.7.1
+
+# Timezone setting
+TZ=UTC

 # Server configuration
-GPUSTACK_HOST="0.0.0.0"
+GPUSTACK_HOST=0.0.0.0
 GPUSTACK_PORT=80
 GPUSTACK_DEBUG=false

 # Admin bootstrap password
-GPUSTACK_BOOTSTRAP_PASSWORD="admin"
+GPUSTACK_BOOTSTRAP_PASSWORD=admin

 # Token for worker registration (auto-generated if not set)
-GPUSTACK_TOKEN=""
+GPUSTACK_TOKEN=

 # Hugging Face token for model downloads
-HF_TOKEN=""
+HF_TOKEN=

 # Port to bind to on the host machine
 GPUSTACK_PORT_OVERRIDE=80
--- a/src/gpustack/README.md
+++ b/src/gpustack/README.md
@@ -2,26 +2,39 @@

 [English](./README.md) | [中文](./README.zh.md)

-This service deploys GPUStack, an open-source GPU cluster manager for running large language models (LLMs).
+GPUStack is an open-source GPU cluster manager for running and scaling large language models (LLMs).
+
+## Quick Start
+
+```bash
+docker compose up -d
+```
+
+Access the web UI at <http://localhost:80> with default credentials `admin` / `admin`.

 ## Services

- `gpustack`: GPUStack server with built-in worker
+- `gpustack`: GPUStack server with GPU support enabled by default
+
+## Ports
+
+| Service  | Port |
+| -------- | ---- |
+| gpustack | 80   |

 ## Environment Variables

-| Variable Name               | Description                            | Default Value |
-| --------------------------- | -------------------------------------- | ------------- |
-| GPUSTACK_VERSION            | GPUStack image version                 | `v0.5.3`      |
-| GPUSTACK_HOST               | Host to bind the server to             | `0.0.0.0`     |
-| GPUSTACK_PORT               | Port to bind the server to             | `80`          |
-| GPUSTACK_DEBUG              | Enable debug mode                      | `false`       |
-| GPUSTACK_BOOTSTRAP_PASSWORD | Password for the bootstrap admin user  | `admin`       |
-| GPUSTACK_TOKEN              | Token for worker registration          | (auto)        |
-| HF_TOKEN                    | Hugging Face token for model downloads | `""`          |
-| GPUSTACK_PORT_OVERRIDE      | Host port mapping                      | `80`          |
-
-Please modify the `.env` file as needed for your use case.
+| Variable                    | Description                            | Default   |
+| --------------------------- | -------------------------------------- | --------- |
+| GPUSTACK_VERSION            | GPUStack image version                 | `v0.7.1`  |
+| TZ                          | Timezone setting                       | `UTC`     |
+| GPUSTACK_HOST               | Host to bind the server to             | `0.0.0.0` |
+| GPUSTACK_PORT               | Port to bind the server to             | `80`      |
+| GPUSTACK_DEBUG              | Enable debug mode                      | `false`   |
+| GPUSTACK_BOOTSTRAP_PASSWORD | Password for the bootstrap admin user  | `admin`   |
+| GPUSTACK_TOKEN              | Token for worker registration          | (auto)    |
+| HF_TOKEN                    | Hugging Face token for model downloads | (empty)   |
+| GPUSTACK_PORT_OVERRIDE      | Host port mapping                      | `80`      |

 ## Volumes

@@ -29,84 +42,79 @@ Please modify the `.env` file as needed for your use case.

 ## GPU Support

-### NVIDIA GPU
-
-Uncomment the GPU-related configuration in `docker-compose.yaml`:
+This service is configured with NVIDIA GPU support enabled by default. The configuration uses:

 ```yaml
-    deploy:
-      resources:
-        reservations:
-          devices:
-            - driver: nvidia
-              count: 1
-              capabilities: [gpu]
-    runtime: nvidia
+deploy:
+  resources:
+    reservations:
+      devices:
+        - driver: nvidia
+          device_ids: [ '0' ]
+          capabilities: [ gpu ]
 ```

+### Requirements
+
+- NVIDIA GPU with CUDA support
+- NVIDIA Container Toolkit installed on the host
+- Docker 19.03+ with GPU support
+
 ### AMD GPU (ROCm)

-Use the ROCm-specific image:
+To use AMD GPUs with ROCm support:

-```yaml
-image: gpustack/gpustack:v0.5.3-rocm
-```
+1. Use the ROCm-specific image in `docker-compose.yaml`:
+
+   ```yaml
+   image: gpustack/gpustack:${GPUSTACK_VERSION:-v0.7.1}-rocm
+   ```
+
+2. Change the device driver to `amdgpu`:
+
+   ```yaml
+   deploy:
+     resources:
+       reservations:
+         devices:
+           - driver: amdgpu
+             device_ids: [ '0' ]
+             capabilities: [ gpu ]
+   ```

 ## Usage

-### Start GPUStack
-
-```bash
-docker compose up -d
-```
-
-### Access
-
- Web UI: <http://localhost:80>
- Default credentials: `admin` / `admin` (configured via `GPUSTACK_BOOTSTRAP_PASSWORD`)
-
 ### Deploy a Model

-1. Log in to the web UI
-2. Navigate to Models
-3. Click "Deploy Model"
-4. Select a model from the catalog or add a custom model
-5. Configure the model parameters
-6. Click "Deploy"
+1. Log in to the web UI at <http://localhost:80>
+2. Navigate to **Models** → **Deploy Model**
+3. Select a model from the catalog or add a custom model
+4. Configure the model parameters
+5. Click **Deploy**

 ### Add Worker Nodes

-To add more GPU nodes to the cluster:
+To scale your cluster by adding more GPU nodes:

 1. Get the registration token from the server:

-    ```bash
-    docker exec gpustack cat /var/lib/gpustack/token
-    ```
+   ```bash
+   docker exec gpustack gpustack show-token
+   ```

 2. Start a worker on another node:

-    ```bash
-    docker run -d --name gpustack-worker \
-      --gpus all \
-      --network host \
-      --ipc host \
-      -v gpustack-data:/var/lib/gpustack \
-      gpustack/gpustack:v0.5.3 \
-      --server-url http://your-server-ip:80 \
-      --token YOUR_TOKEN
-    ```
+   ```bash
+   docker run -d --name gpustack-worker \
+     --gpus all \
+     --network host \
+     --ipc host \
+     -v gpustack-worker-data:/var/lib/gpustack \
+     gpustack/gpustack:v0.7.1 \
+     gpustack start --server-url http://your-server-ip:80 --token YOUR_TOKEN
+   ```

-## Features
-
- **Model Management**: Deploy and manage LLM models from Hugging Face, ModelScope, or custom sources
- **GPU Scheduling**: Automatic GPU allocation and scheduling
- **Multi-Backend**: Supports llama-box, vLLM, and other backends
- **API Compatible**: OpenAI-compatible API endpoint
- **Web UI**: User-friendly web interface for management
- **Monitoring**: Resource usage and model metrics
-
-## API Usage
+### API Usage

 GPUStack provides an OpenAI-compatible API:

@@ -120,19 +128,31 @@ curl http://localhost:80/v1/chat/completions \
  }'
 ```

+## Features
+
+- **Model Management**: Deploy and manage LLM models from Hugging Face, ModelScope, or custom sources
+- **GPU Scheduling**: Automatic GPU allocation and load balancing
+- **Multi-Backend**: Supports llama-box, vLLM, and other inference backends
+- **OpenAI-Compatible API**: Drop-in replacement for OpenAI API
+- **Web UI**: User-friendly web interface for cluster management
+- **Monitoring**: Real-time resource usage and model performance metrics
+- **Multi-Node**: Scale across multiple GPU servers
+
 ## Notes

- For production use, change the default password
- GPU support requires NVIDIA Docker runtime or AMD ROCm support
- Model downloads can be large (several GB), ensure sufficient disk space
- First model deployment may take time as it downloads the model files
+- **Production Security**: Change the default `GPUSTACK_BOOTSTRAP_PASSWORD` before deploying
+- **GPU Requirements**: NVIDIA GPU with CUDA support is required; ensure NVIDIA Container Toolkit is installed
+- **Disk Space**: Model downloads can be several gigabytes; ensure sufficient storage
+- **First Deployment**: Initial model deployment may take time as it downloads model files
+- **Network**: By default, the service binds to all interfaces (`0.0.0.0`); restrict access in production

 ## Security

- Change default admin password after first login
- Use strong passwords for API keys
- Consider using TLS for production deployments
- Restrict network access to trusted sources
+- **Change Default Password**: Update `GPUSTACK_BOOTSTRAP_PASSWORD` after first login
+- **API Keys**: Use strong, unique API keys for accessing the API
+- **TLS/HTTPS**: Consider using a reverse proxy with TLS for production
+- **Network Access**: Restrict access to trusted networks using firewalls
+- **Updates**: Keep GPUStack updated to the latest stable version

 ## License

--- a/src/gpustack/README.zh.md
+++ b/src/gpustack/README.zh.md
@@ -0,0 +1,159 @@
+# GPUStack
+
+[English](./README.md) | [中文](./README.zh.md)
+
+GPUStack 是一个开源的 GPU 集群管理器，用于运行和扩展大型语言模型（LLM）。
+
+## 快速开始
+
+```bash
+docker compose up -d
+```
+
+在 <http://localhost:80> 访问 Web UI，默认凭据为 `admin` / `admin`。
+
+## 服务
+
+- `gpustack`：默认启用 GPU 支持的 GPUStack 服务器
+
+## 端口
+
+| 服务     | 端口 |
+| -------- | ---- |
+| gpustack | 80   |
+
+## 环境变量
+
+| 变量名                      | 描述                      | 默认值    |
+| --------------------------- | ------------------------- | --------- |
+| GPUSTACK_VERSION            | GPUStack 镜像版本         | `v0.7.1`  |
+| TZ                          | 时区设置                  | `UTC`     |
+| GPUSTACK_HOST               | 服务器绑定的主机地址      | `0.0.0.0` |
+| GPUSTACK_PORT               | 服务器绑定的端口          | `80`      |
+| GPUSTACK_DEBUG              | 启用调试模式              | `false`   |
+| GPUSTACK_BOOTSTRAP_PASSWORD | 引导管理员用户的密码      | `admin`   |
+| GPUSTACK_TOKEN              | Worker 注册令牌           | （自动）  |
+| HF_TOKEN                    | Hugging Face 模型下载令牌 | （空）    |
+| GPUSTACK_PORT_OVERRIDE      | 主机端口映射              | `80`      |
+
+## 卷
+
+- `gpustack_data`：GPUStack 数据目录
+
+## GPU 支持
+
+本服务默认配置了 NVIDIA GPU 支持。配置使用：
+
+```yaml
+deploy:
+  resources:
+    reservations:
+      devices:
+        - driver: nvidia
+          device_ids: [ '0' ]
+          capabilities: [ gpu ]
+```
+
+### 要求
+
+- 支持 CUDA 的 NVIDIA GPU
+- 主机上安装了 NVIDIA Container Toolkit
+- Docker 19.03+ 支持 GPU
+
+### AMD GPU（ROCm）
+
+要使用支持 ROCm 的 AMD GPU：
+
+1. 在 `docker-compose.yaml` 中使用 ROCm 特定镜像：
+
+   ```yaml
+   image: gpustack/gpustack:${GPUSTACK_VERSION:-v0.7.1}-rocm
+   ```
+
+2. 将设备驱动更改为 `amdgpu`：
+
+   ```yaml
+   deploy:
+     resources:
+       reservations:
+         devices:
+           - driver: amdgpu
+             device_ids: [ '0' ]
+             capabilities: [ gpu ]
+   ```
+
+## 使用方法
+
+### 部署模型
+
+1. 在 <http://localhost:80> 登录 Web UI
+2. 导航到 **Models** → **Deploy Model**
+3. 从目录中选择模型或添加自定义模型
+4. 配置模型参数
+5. 点击 **Deploy**
+
+### 添加 Worker 节点
+
+通过添加更多 GPU 节点来扩展集群：
+
+1. 从服务器获取注册令牌：
+
+   ```bash
+   docker exec gpustack gpustack show-token
+   ```
+
+2. 在另一个节点上启动 Worker：
+
+   ```bash
+   docker run -d --name gpustack-worker \
+     --gpus all \
+     --network host \
+     --ipc host \
+     -v gpustack-worker-data:/var/lib/gpustack \
+     gpustack/gpustack:v0.7.1 \
+     gpustack start --server-url http://your-server-ip:80 --token YOUR_TOKEN
+   ```
+
+### API 使用
+
+GPUStack 提供与 OpenAI 兼容的 API：
+
+```bash
+curl http://localhost:80/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -H "Authorization: Bearer YOUR_API_KEY" \
+  -d '{
+    "model": "llama-3.2-3b-instruct",
+    "messages": [{"role": "user", "content": "Hello!"}]
+  }'
+```
+
+## 功能特性
+
+- **模型管理**：从 Hugging Face、ModelScope 或自定义源部署和管理 LLM 模型
+- **GPU 调度**：自动 GPU 分配和负载均衡
+- **多后端支持**：支持 llama-box、vLLM 和其他推理后端
+- **OpenAI 兼容 API**：可直接替代 OpenAI API
+- **Web UI**：用户友好的 Web 界面，用于集群管理
+- **监控**：实时资源使用和模型性能指标
+- **多节点**：可跨多个 GPU 服务器扩展
+
+## 注意事项
+
+- **生产环境安全**：部署前请更改默认的 `GPUSTACK_BOOTSTRAP_PASSWORD`
+- **GPU 要求**：需要支持 CUDA 的 NVIDIA GPU；确保已安装 NVIDIA Container Toolkit
+- **磁盘空间**：模型下载可能有数 GB；确保有足够的存储空间
+- **首次部署**：初次部署模型可能需要时间来下载模型文件
+- **网络**：默认情况下，服务绑定到所有接口（`0.0.0.0`）；在生产环境中请限制访问
+
+## 安全
+
+- **更改默认密码**：首次登录后更新 `GPUSTACK_BOOTSTRAP_PASSWORD`
+- **API 密钥**：使用强且唯一的 API 密钥访问 API
+- **TLS/HTTPS**：在生产环境中考虑使用带 TLS 的反向代理
+- **网络访问**：使用防火墙将访问限制在受信任的网络
+- **更新**：保持 GPUStack 更新到最新稳定版本
+
+## 许可证
+
+GPUStack 采用 Apache License 2.0 许可。更多信息请参见 [GPUStack GitHub](https://github.com/gpustack/gpustack)。
--- a/src/gpustack/docker-compose.yaml
+++ b/src/gpustack/docker-compose.yaml
@@ -9,7 +9,7 @@ x-default: &default
 services:
  gpustack:
    <<: *default
-    image: gpustack/gpustack:${GPUSTACK_VERSION:-v0.5.3}
+    image: gpustack/gpustack:${GPUSTACK_VERSION:-v0.7.1}
    ports:
      - "${GPUSTACK_PORT_OVERRIDE:-80}:80"
    volumes:
@@ -22,21 +22,19 @@ services:
      - GPUSTACK_TOKEN=${GPUSTACK_TOKEN:-}
      - GPUSTACK_BOOTSTRAP_PASSWORD=${GPUSTACK_BOOTSTRAP_PASSWORD:-admin}
      - HF_TOKEN=${HF_TOKEN:-}
+    ipc: host
    deploy:
      resources:
        limits:
+          cpus: '8.0'
+          memory: 8G
+        reservations:
          cpus: '2.0'
          memory: 4G
-        reservations:
-          cpus: '1.0'
-          memory: 2G
-          # Uncomment below for GPU support
-          # devices:
-          #   - driver: nvidia
-          #     count: 1
-          #     capabilities: [gpu]
-    # For GPU support, uncomment the following section
-    # runtime: nvidia
+          devices:
+            - driver: nvidia
+              device_ids: [ '0' ]
+              capabilities: [ gpu ]
    healthcheck:
      test: ["CMD", "wget", "--no-verbose", "--tries=1", "--spider", "http://localhost:80/health"]
      interval: 30s