# Ray [English](./README.md) | [中文](./README.zh.md) This service deploys a Ray cluster with 1 head node and 2 worker nodes for distributed computing. ## Services - `ray-head`: Ray head node with dashboard. - `ray-worker-1`: First Ray worker node. - `ray-worker-2`: Second Ray worker node. ## Environment Variables | Variable Name | Description | Default Value | | --------------------------- | -------------------------- | ------------------ | | RAY_VERSION | Ray image version | `2.42.1-py312` | | RAY_HEAD_NUM_CPUS | Head node CPU count | `4` | | RAY_HEAD_MEMORY | Head node memory (bytes) | `8589934592` (8GB) | | RAY_WORKER_NUM_CPUS | Worker node CPU count | `2` | | RAY_WORKER_MEMORY | Worker node memory (bytes) | `4294967296` (4GB) | | RAY_DASHBOARD_PORT_OVERRIDE | Ray Dashboard port | `8265` | | RAY_CLIENT_PORT_OVERRIDE | Ray Client Server port | `10001` | | RAY_GCS_PORT_OVERRIDE | Ray GCS Server port | `6379` | Please modify the `.env` file as needed for your use case. ## Volumes - `ray_storage`: Shared storage for Ray temporary files. ## Usage ### Start the Cluster ```bash docker-compose up -d ``` ### Access Ray Dashboard Open your browser and navigate to: ```text http://localhost:8265 ``` The dashboard shows cluster status, running jobs, and resource usage. ### Connect from Python Client ```python import ray # Connect to the Ray cluster ray.init(address="ray://localhost:10001") # Run a simple task @ray.remote def hello_world(): return "Hello from Ray!" # Execute the task result = ray.get(hello_world.remote()) print(result) # Check cluster resources print(ray.cluster_resources()) ``` ### Distributed Computing Example ```python import ray import time ray.init(address="ray://localhost:10001") @ray.remote def compute_task(x): time.sleep(1) return x * x # Submit 100 tasks in parallel results = ray.get([compute_task.remote(i) for i in range(100)]) print(f"Sum of squares: {sum(results)}") ``` ### Using Ray Data ```python import ray ray.init(address="ray://localhost:10001") # Create a dataset ds = ray.data.range(1000) # Process data in parallel result = ds.map(lambda x: x * 2).take(10) print(result) ``` ## Features - **Distributed Computing**: Scale Python applications across multiple nodes - **Auto-scaling**: Dynamic resource allocation - **Ray Dashboard**: Web UI for monitoring and debugging - **Ray Data**: Distributed data processing - **Ray Train**: Distributed training for ML models - **Ray Serve**: Model serving and deployment - **Ray Tune**: Hyperparameter tuning ## Notes - Workers automatically connect to the head node - The cluster has 1 head node (4 CPU, 8GB RAM) and 2 workers (2 CPU, 4GB RAM each) - Total cluster resources: 8 CPUs, 16GB RAM - Add more workers by duplicating the worker service definition - For GPU support, use `rayproject/ray-ml` image and configure NVIDIA runtime - Ray uses Redis protocol on port 6379 for cluster communication ## Scaling To add more worker nodes, add new service definitions: ```yaml ray-worker-3: <<: *defaults image: rayproject/ray:${RAY_VERSION:-2.42.1-py312} container_name: ray-worker-3 command: ray start --address=ray-head:6379 --block depends_on: - ray-head environment: RAY_NUM_CPUS: ${RAY_WORKER_NUM_CPUS:-2} RAY_MEMORY: ${RAY_WORKER_MEMORY:-4294967296} ``` ## License Ray is licensed under the Apache License 2.0.