3.5 KiB
3.5 KiB
Ray
This service deploys a Ray cluster with 1 head node and 2 worker nodes for distributed computing.
Services
ray-head: Ray head node with dashboard.ray-worker-1: First Ray worker node.ray-worker-2: Second Ray worker node.
Environment Variables
| Variable Name | Description | Default Value |
|---|---|---|
| RAY_VERSION | Ray image version | 2.42.1-py312 |
| RAY_HEAD_NUM_CPUS | Head node CPU count | 4 |
| RAY_HEAD_MEMORY | Head node memory (bytes) | 8589934592 (8GB) |
| RAY_WORKER_NUM_CPUS | Worker node CPU count | 2 |
| RAY_WORKER_MEMORY | Worker node memory (bytes) | 4294967296 (4GB) |
| RAY_DASHBOARD_PORT_OVERRIDE | Ray Dashboard port | 8265 |
| RAY_CLIENT_PORT_OVERRIDE | Ray Client Server port | 10001 |
| RAY_GCS_PORT_OVERRIDE | Ray GCS Server port | 6379 |
Please modify the .env file as needed for your use case.
Volumes
ray_storage: Shared storage for Ray temporary files.
Usage
Start the Cluster
docker-compose up -d
Access Ray Dashboard
Open your browser and navigate to:
http://localhost:8265
The dashboard shows cluster status, running jobs, and resource usage.
Connect from Python Client
import ray
# Connect to the Ray cluster
ray.init(address="ray://localhost:10001")
# Run a simple task
@ray.remote
def hello_world():
return "Hello from Ray!"
# Execute the task
result = ray.get(hello_world.remote())
print(result)
# Check cluster resources
print(ray.cluster_resources())
Distributed Computing Example
import ray
import time
ray.init(address="ray://localhost:10001")
@ray.remote
def compute_task(x):
time.sleep(1)
return x * x
# Submit 100 tasks in parallel
results = ray.get([compute_task.remote(i) for i in range(100)])
print(f"Sum of squares: {sum(results)}")
Using Ray Data
import ray
ray.init(address="ray://localhost:10001")
# Create a dataset
ds = ray.data.range(1000)
# Process data in parallel
result = ds.map(lambda x: x * 2).take(10)
print(result)
Features
- Distributed Computing: Scale Python applications across multiple nodes
- Auto-scaling: Dynamic resource allocation
- Ray Dashboard: Web UI for monitoring and debugging
- Ray Data: Distributed data processing
- Ray Train: Distributed training for ML models
- Ray Serve: Model serving and deployment
- Ray Tune: Hyperparameter tuning
Notes
- Workers automatically connect to the head node
- The cluster has 1 head node (4 CPU, 8GB RAM) and 2 workers (2 CPU, 4GB RAM each)
- Total cluster resources: 8 CPUs, 16GB RAM
- Add more workers by duplicating the worker service definition
- For GPU support, use
rayproject/ray-mlimage and configure NVIDIA runtime - Ray uses Redis protocol on port 6379 for cluster communication
Scaling
To add more worker nodes, add new service definitions:
ray-worker-3:
<<: *defaults
image: rayproject/ray:${RAY_VERSION:-2.42.1-py312}
container_name: ray-worker-3
command: ray start --address=ray-head:6379 --block
depends_on:
- ray-head
environment:
RAY_NUM_CPUS: ${RAY_WORKER_NUM_CPUS:-2}
RAY_MEMORY: ${RAY_WORKER_MEMORY:-4294967296}
License
Ray is licensed under the Apache License 2.0.