feat: add more

2025-10-06 21:48:39 +08:00
parent f330e00fa0
commit 3c609b5989
120 changed files with 7698 additions and 59 deletions
--- a/src/ray/.env.example
+++ b/src/ray/.env.example
@@ -0,0 +1,15 @@
+# Ray version
+RAY_VERSION="2.42.1-py312"
+
+# Ray head node configuration
+RAY_HEAD_NUM_CPUS=4
+RAY_HEAD_MEMORY=8589934592  # 8GB in bytes
+
+# Ray worker node configuration
+RAY_WORKER_NUM_CPUS=2
+RAY_WORKER_MEMORY=4294967296  # 4GB in bytes
+
+# Port overrides
+RAY_DASHBOARD_PORT_OVERRIDE=8265
+RAY_CLIENT_PORT_OVERRIDE=10001
+RAY_GCS_PORT_OVERRIDE=6379
--- a/src/ray/README.md
+++ b/src/ray/README.md
@@ -0,0 +1,142 @@
+# Ray
+
+[English](./README.md) | [中文](./README.zh.md)
+
+This service deploys a Ray cluster with 1 head node and 2 worker nodes for distributed computing.
+
+## Services
+
+- `ray-head`: Ray head node with dashboard.
+- `ray-worker-1`: First Ray worker node.
+- `ray-worker-2`: Second Ray worker node.
+
+## Environment Variables
+
+| Variable Name               | Description                | Default Value      |
+| --------------------------- | -------------------------- | ------------------ |
+| RAY_VERSION                 | Ray image version          | `2.42.1-py312`     |
+| RAY_HEAD_NUM_CPUS           | Head node CPU count        | `4`                |
+| RAY_HEAD_MEMORY             | Head node memory (bytes)   | `8589934592` (8GB) |
+| RAY_WORKER_NUM_CPUS         | Worker node CPU count      | `2`                |
+| RAY_WORKER_MEMORY           | Worker node memory (bytes) | `4294967296` (4GB) |
+| RAY_DASHBOARD_PORT_OVERRIDE | Ray Dashboard port         | `8265`             |
+| RAY_CLIENT_PORT_OVERRIDE    | Ray Client Server port     | `10001`            |
+| RAY_GCS_PORT_OVERRIDE       | Ray GCS Server port        | `6379`             |
+
+Please modify the `.env` file as needed for your use case.
+
+## Volumes
+
+- `ray_storage`: Shared storage for Ray temporary files.
+
+## Usage
+
+### Start the Cluster
+
+```bash
+docker-compose up -d
+```
+
+### Access Ray Dashboard
+
+Open your browser and navigate to:
+
+```text
+http://localhost:8265
+```
+
+The dashboard shows cluster status, running jobs, and resource usage.
+
+### Connect from Python Client
+
+```python
+import ray
+
+# Connect to the Ray cluster
+ray.init(address="ray://localhost:10001")
+
+# Run a simple task
+@ray.remote
+def hello_world():
+    return "Hello from Ray!"
+
+# Execute the task
+result = ray.get(hello_world.remote())
+print(result)
+
+# Check cluster resources
+print(ray.cluster_resources())
+```
+
+### Distributed Computing Example
+
+```python
+import ray
+import time
+
+ray.init(address="ray://localhost:10001")
+
+@ray.remote
+def compute_task(x):
+    time.sleep(1)
+    return x * x
+
+# Submit 100 tasks in parallel
+results = ray.get([compute_task.remote(i) for i in range(100)])
+print(f"Sum of squares: {sum(results)}")
+```
+
+### Using Ray Data
+
+```python
+import ray
+
+ray.init(address="ray://localhost:10001")
+
+# Create a dataset
+ds = ray.data.range(1000)
+
+# Process data in parallel
+result = ds.map(lambda x: x * 2).take(10)
+print(result)
+```
+
+## Features
+
+- **Distributed Computing**: Scale Python applications across multiple nodes
+- **Auto-scaling**: Dynamic resource allocation
+- **Ray Dashboard**: Web UI for monitoring and debugging
+- **Ray Data**: Distributed data processing
+- **Ray Train**: Distributed training for ML models
+- **Ray Serve**: Model serving and deployment
+- **Ray Tune**: Hyperparameter tuning
+
+## Notes
+
+- Workers automatically connect to the head node
+- The cluster has 1 head node (4 CPU, 8GB RAM) and 2 workers (2 CPU, 4GB RAM each)
+- Total cluster resources: 8 CPUs, 16GB RAM
+- Add more workers by duplicating the worker service definition
+- For GPU support, use `rayproject/ray-ml` image and configure NVIDIA runtime
+- Ray uses Redis protocol on port 6379 for cluster communication
+
+## Scaling
+
+To add more worker nodes, add new service definitions:
+
+```yaml
+ray-worker-3:
+  <<: *default
+  image: rayproject/ray:${RAY_VERSION:-2.42.1-py312}
+  container_name: ray-worker-3
+  command: ray start --address=ray-head:6379 --block
+  depends_on:
+    - ray-head
+  environment:
+    RAY_NUM_CPUS: ${RAY_WORKER_NUM_CPUS:-2}
+    RAY_MEMORY: ${RAY_WORKER_MEMORY:-4294967296}
+```
+
+## License
+
+Ray is licensed under the Apache License 2.0.
--- a/src/ray/README.zh.md
+++ b/src/ray/README.zh.md
@@ -0,0 +1,142 @@
+# Ray
+
+[English](./README.md) | [中文](./README.zh.md)
+
+此服务用于部署一个包含 1 个头节点和 2 个工作节点的 Ray 集群，用于分布式计算。
+
+## 服务
+
+- `ray-head`: Ray 头节点，带有仪表板。
+- `ray-worker-1`: 第一个 Ray 工作节点。
+- `ray-worker-2`: 第二个 Ray 工作节点。
+
+## 环境变量
+
+| 变量名                      | 说明                 | 默认值             |
+| --------------------------- | -------------------- | ------------------ |
+| RAY_VERSION                 | Ray 镜像版本         | `2.42.1-py312`     |
+| RAY_HEAD_NUM_CPUS           | 头节点 CPU 数量      | `4`                |
+| RAY_HEAD_MEMORY             | 头节点内存（字节）   | `8589934592` (8GB) |
+| RAY_WORKER_NUM_CPUS         | 工作节点 CPU 数量    | `2`                |
+| RAY_WORKER_MEMORY           | 工作节点内存（字节） | `4294967296` (4GB) |
+| RAY_DASHBOARD_PORT_OVERRIDE | Ray 仪表板端口       | `8265`             |
+| RAY_CLIENT_PORT_OVERRIDE    | Ray 客户端服务器端口 | `10001`            |
+| RAY_GCS_PORT_OVERRIDE       | Ray GCS 服务器端口   | `6379`             |
+
+请根据实际需求修改 `.env` 文件。
+
+## 卷
+
+- `ray_storage`: Ray 临时文件的共享存储。
+
+## 使用方法
+
+### 启动集群
+
+```bash
+docker-compose up -d
+```
+
+### 访问 Ray 仪表板
+
+在浏览器中打开:
+
+```text
+http://localhost:8265
+```
+
+仪表板显示集群状态、正在运行的作业和资源使用情况。
+
+### 从 Python 客户端连接
+
+```python
+import ray
+
+# 连接到 Ray 集群
+ray.init(address="ray://localhost:10001")
+
+# 运行简单任务
+@ray.remote
+def hello_world():
+    return "Hello from Ray!"
+
+# 执行任务
+result = ray.get(hello_world.remote())
+print(result)
+
+# 检查集群资源
+print(ray.cluster_resources())
+```
+
+### 分布式计算示例
+
+```python
+import ray
+import time
+
+ray.init(address="ray://localhost:10001")
+
+@ray.remote
+def compute_task(x):
+    time.sleep(1)
+    return x * x
+
+# 并行提交 100 个任务
+results = ray.get([compute_task.remote(i) for i in range(100)])
+print(f"Sum of squares: {sum(results)}")
+```
+
+### 使用 Ray Data
+
+```python
+import ray
+
+ray.init(address="ray://localhost:10001")
+
+# 创建数据集
+ds = ray.data.range(1000)
+
+# 并行处理数据
+result = ds.map(lambda x: x * 2).take(10)
+print(result)
+```
+
+## 功能
+
+- **分布式计算**: 跨多个节点扩展 Python 应用程序
+- **自动扩展**: 动态资源分配
+- **Ray 仪表板**: 用于监控和调试的 Web UI
+- **Ray Data**: 分布式数据处理
+- **Ray Train**: ML 模型的分布式训练
+- **Ray Serve**: 模型服务和部署
+- **Ray Tune**: 超参数调优
+
+## 注意事项
+
+- 工作节点自动连接到头节点
+- 集群有 1 个头节点（4 CPU，8GB RAM）和 2 个工作节点（每个 2 CPU，4GB RAM）
+- 集群总资源: 8 个 CPU，16GB RAM
+- 通过复制工作节点服务定义添加更多工作节点
+- 对于 GPU 支持，使用 `rayproject/ray-ml` 镜像并配置 NVIDIA 运行时
+- Ray 使用端口 6379 上的 Redis 协议进行集群通信
+
+## 扩展
+
+要添加更多工作节点，添加新的服务定义:
+
+```yaml
+ray-worker-3:
+  <<: *default
+  image: rayproject/ray:${RAY_VERSION:-2.42.1-py312}
+  container_name: ray-worker-3
+  command: ray start --address=ray-head:6379 --block
+  depends_on:
+    - ray-head
+  environment:
+    RAY_NUM_CPUS: ${RAY_WORKER_NUM_CPUS:-2}
+    RAY_MEMORY: ${RAY_WORKER_MEMORY:-4294967296}
+```
+
+## 许可证
+
+Ray 使用 Apache License 2.0 授权。
--- a/src/ray/docker-compose.yaml
+++ b/src/ray/docker-compose.yaml
@@ -0,0 +1,82 @@
+x-default: &default
+  restart: unless-stopped
+  volumes:
+    - &localtime /etc/localtime:/etc/localtime:ro
+    - &timezone /etc/timezone:/etc/timezone:ro
+  logging:
+    driver: json-file
+    options:
+      max-size: 100m
+
+services:
+  ray-head:
+    <<: *default
+    image: rayproject/ray:${RAY_VERSION:-2.42.1-py312}
+    container_name: ray-head
+    command: ray start --head --dashboard-host=0.0.0.0 --port=6379 --block
+    ports:
+      - "${RAY_DASHBOARD_PORT_OVERRIDE:-8265}:8265"
+      - "${RAY_CLIENT_PORT_OVERRIDE:-10001}:10001"
+      - "${RAY_GCS_PORT_OVERRIDE:-6379}:6379"
+    environment:
+      RAY_NUM_CPUS: ${RAY_HEAD_NUM_CPUS:-4}
+      RAY_MEMORY: ${RAY_HEAD_MEMORY:-8589934592}
+    volumes:
+      - *localtime
+      - *timezone
+      - ray_storage:/tmp/ray
+    deploy:
+      resources:
+        limits:
+          cpus: '4.0'
+          memory: 8G
+        reservations:
+          cpus: '2.0'
+          memory: 4G
+
+  ray-worker-1:
+    <<: *default
+    image: rayproject/ray:${RAY_VERSION:-2.42.1-py312}
+    container_name: ray-worker-1
+    command: ray start --address=ray-head:6379 --block
+    depends_on:
+      - ray-head
+    environment:
+      RAY_NUM_CPUS: ${RAY_WORKER_NUM_CPUS:-2}
+      RAY_MEMORY: ${RAY_WORKER_MEMORY:-4294967296}
+    volumes:
+      - *localtime
+      - *timezone
+    deploy:
+      resources:
+        limits:
+          cpus: '2.0'
+          memory: 4G
+        reservations:
+          cpus: '1.0'
+          memory: 2G
+
+  ray-worker-2:
+    <<: *default
+    image: rayproject/ray:${RAY_VERSION:-2.42.1-py312}
+    container_name: ray-worker-2
+    command: ray start --address=ray-head:6379 --block
+    depends_on:
+      - ray-head
+    environment:
+      RAY_NUM_CPUS: ${RAY_WORKER_NUM_CPUS:-2}
+      RAY_MEMORY: ${RAY_WORKER_MEMORY:-4294967296}
+    volumes:
+      - *localtime
+      - *timezone
+    deploy:
+      resources:
+        limits:
+          cpus: '2.0'
+          memory: 4G
+        reservations:
+          cpus: '1.0'
+          memory: 2G
+
+volumes:
+  ray_storage: