Files
compose-anything/src/ray/README.md
2025-11-03 20:16:08 +08:00

3.5 KiB

Ray

English | 中文

This service deploys a Ray cluster with 1 head node and 2 worker nodes for distributed computing.

Services

  • ray-head: Ray head node with dashboard.
  • ray-worker-1: First Ray worker node.
  • ray-worker-2: Second Ray worker node.

Environment Variables

Variable Name Description Default Value
RAY_VERSION Ray image version 2.42.1-py312
RAY_HEAD_NUM_CPUS Head node CPU count 4
RAY_HEAD_MEMORY Head node memory (bytes) 8589934592 (8GB)
RAY_WORKER_NUM_CPUS Worker node CPU count 2
RAY_WORKER_MEMORY Worker node memory (bytes) 4294967296 (4GB)
RAY_DASHBOARD_PORT_OVERRIDE Ray Dashboard port 8265
RAY_CLIENT_PORT_OVERRIDE Ray Client Server port 10001
RAY_GCS_PORT_OVERRIDE Ray GCS Server port 6379

Please modify the .env file as needed for your use case.

Volumes

  • ray_storage: Shared storage for Ray temporary files.

Usage

Start the Cluster

docker-compose up -d

Access Ray Dashboard

Open your browser and navigate to:

http://localhost:8265

The dashboard shows cluster status, running jobs, and resource usage.

Connect from Python Client

import ray

# Connect to the Ray cluster
ray.init(address="ray://localhost:10001")

# Run a simple task
@ray.remote
def hello_world():
    return "Hello from Ray!"

# Execute the task
result = ray.get(hello_world.remote())
print(result)

# Check cluster resources
print(ray.cluster_resources())

Distributed Computing Example

import ray
import time

ray.init(address="ray://localhost:10001")

@ray.remote
def compute_task(x):
    time.sleep(1)
    return x * x

# Submit 100 tasks in parallel
results = ray.get([compute_task.remote(i) for i in range(100)])
print(f"Sum of squares: {sum(results)}")

Using Ray Data

import ray

ray.init(address="ray://localhost:10001")

# Create a dataset
ds = ray.data.range(1000)

# Process data in parallel
result = ds.map(lambda x: x * 2).take(10)
print(result)

Features

  • Distributed Computing: Scale Python applications across multiple nodes
  • Auto-scaling: Dynamic resource allocation
  • Ray Dashboard: Web UI for monitoring and debugging
  • Ray Data: Distributed data processing
  • Ray Train: Distributed training for ML models
  • Ray Serve: Model serving and deployment
  • Ray Tune: Hyperparameter tuning

Notes

  • Workers automatically connect to the head node
  • The cluster has 1 head node (4 CPU, 8GB RAM) and 2 workers (2 CPU, 4GB RAM each)
  • Total cluster resources: 8 CPUs, 16GB RAM
  • Add more workers by duplicating the worker service definition
  • For GPU support, use rayproject/ray-ml image and configure NVIDIA runtime
  • Ray uses Redis protocol on port 6379 for cluster communication

Scaling

To add more worker nodes, add new service definitions:

ray-worker-3:
  <<: *defaults
  image: rayproject/ray:${RAY_VERSION:-2.42.1-py312}
  container_name: ray-worker-3
  command: ray start --address=ray-head:6379 --block
  depends_on:
    - ray-head
  environment:
    RAY_NUM_CPUS: ${RAY_WORKER_NUM_CPUS:-2}
    RAY_MEMORY: ${RAY_WORKER_MEMORY:-4294967296}

License

Ray is licensed under the Apache License 2.0.