Ray

English | 中文

This service deploys a Ray cluster with 1 head node and 2 worker nodes for distributed computing.

Services

ray-head: Ray head node with dashboard.
ray-worker-1: First Ray worker node.
ray-worker-2: Second Ray worker node.

Environment Variables

Variable Name	Description	Default Value
RAY_VERSION	Ray image version	`2.42.1-py312`
RAY_HEAD_NUM_CPUS	Head node CPU count	`4`
RAY_HEAD_MEMORY	Head node memory (bytes)	`8589934592` (8GB)
RAY_WORKER_NUM_CPUS	Worker node CPU count	`2`
RAY_WORKER_MEMORY	Worker node memory (bytes)	`4294967296` (4GB)
RAY_DASHBOARD_PORT_OVERRIDE	Ray Dashboard port	`8265`
RAY_CLIENT_PORT_OVERRIDE	Ray Client Server port	`10001`
RAY_GCS_PORT_OVERRIDE	Ray GCS Server port	`6379`

Please modify the .env file as needed for your use case.

Volumes

ray_storage: Shared storage for Ray temporary files.

Usage

Start the Cluster

docker-compose up -d

Access Ray Dashboard

Open your browser and navigate to:

http://localhost:8265

The dashboard shows cluster status, running jobs, and resource usage.

Connect from Python Client

import ray

# Connect to the Ray cluster
ray.init(address="ray://localhost:10001")

# Run a simple task
@ray.remote
def hello_world():
    return "Hello from Ray!"

# Execute the task
result = ray.get(hello_world.remote())
print(result)

# Check cluster resources
print(ray.cluster_resources())

Distributed Computing Example

import ray
import time

ray.init(address="ray://localhost:10001")

@ray.remote
def compute_task(x):
    time.sleep(1)
    return x * x

# Submit 100 tasks in parallel
results = ray.get([compute_task.remote(i) for i in range(100)])
print(f"Sum of squares: {sum(results)}")

Using Ray Data

import ray

ray.init(address="ray://localhost:10001")

# Create a dataset
ds = ray.data.range(1000)

# Process data in parallel
result = ds.map(lambda x: x * 2).take(10)
print(result)

Features

Distributed Computing: Scale Python applications across multiple nodes
Auto-scaling: Dynamic resource allocation
Ray Dashboard: Web UI for monitoring and debugging
Ray Data: Distributed data processing
Ray Train: Distributed training for ML models
Ray Serve: Model serving and deployment
Ray Tune: Hyperparameter tuning

Notes

Workers automatically connect to the head node
The cluster has 1 head node (4 CPU, 8GB RAM) and 2 workers (2 CPU, 4GB RAM each)
Total cluster resources: 8 CPUs, 16GB RAM
Add more workers by duplicating the worker service definition
For GPU support, use rayproject/ray-ml image and configure NVIDIA runtime
Ray uses Redis protocol on port 6379 for cluster communication

Scaling

To add more worker nodes, add new service definitions:

ray-worker-3:
  <<: *defaults
  image: rayproject/ray:${RAY_VERSION:-2.42.1-py312}
  container_name: ray-worker-3
  command: ray start --address=ray-head:6379 --block
  depends_on:
    - ray-head
  environment:
    RAY_NUM_CPUS: ${RAY_WORKER_NUM_CPUS:-2}
    RAY_MEMORY: ${RAY_WORKER_MEMORY:-4294967296}

License

Ray is licensed under the Apache License 2.0.

3.5 KiB Raw Blame History

Ray