# Restate Multi-Node Cluster A highly-available 3-node Restate cluster for production workloads. This configuration provides fault tolerance and automatic failover capabilities. ## Features - **High Availability**: 3-node cluster can tolerate 1 node failure - **Data Replication**: Configurable replication factor (default: 2 of 3 nodes) - **Automatic Snapshots**: Periodic snapshots stored in MinIO (S3-compatible) - **Load Distribution**: Multiple ingress endpoints for load balancing - **Metadata Quorum**: Replicated metadata cluster for consistency ## Quick Start 1. Copy environment file: ```bash cp .env.example .env ``` 2. Start the cluster: ```bash docker compose up -d ``` 3. Verify cluster health: ```bash # Check node 1 curl http://localhost:9070/health # Check node 2 curl http://localhost:29070/health # Check node 3 curl http://localhost:39070/health ``` ## Architecture The cluster consists of: - **3 Restate Nodes**: Distributed workflow engines with replicated state - **MinIO**: S3-compatible storage for partition snapshots - **Replicated Bifrost Provider**: Log replication across nodes - **Metadata Cluster**: Distributed metadata server on all nodes ## Service Endpoints ### Node 1 (Primary) - Ingress API: `http://localhost:8080` - Admin API: `http://localhost:9070` - Node Communication: Port 5122 ### Node 2 - Ingress API: `http://localhost:28080` - Admin API: `http://localhost:29070` - Node Communication: Port 25122 ### Node 3 - Ingress API: `http://localhost:38080` - Admin API: `http://localhost:39070` - Node Communication: Port 35122 ### MinIO - API: `http://localhost:9000` - Console: `http://localhost:9001` (admin UI) - Username: `minioadmin` - Password: `minioadmin` ## Environment Variables ### Cluster Configuration | Variable | Default | Description | | ----------------------------- | ----------------- | --------------------------- | | `RESTATE_VERSION` | `1.5.3` | Restate server version | | `RESTATE_CLUSTER_NAME` | `restate-cluster` | Cluster name | | `RESTATE_DEFAULT_REPLICATION` | `2` | Minimum replicas for writes | ### Port Configuration Each node has three ports: - `NODEx_INGRESS_PORT_OVERRIDE`: Ingress API port - `NODEx_ADMIN_PORT_OVERRIDE`: Admin API port - `NODEx_NODE_PORT_OVERRIDE`: Node-to-node communication port ### Snapshot Configuration | Variable | Default | Description | | ---------------------------------------------------------- | ------------------------ | -------------------- | | `RESTATE_WORKER__SNAPSHOTS__DESTINATION` | `s3://restate/snapshots` | S3 bucket path | | `RESTATE_WORKER__SNAPSHOTS__SNAPSHOT_INTERVAL_NUM_RECORDS` | `1000` | Records per snapshot | | `RESTATE_WORKER__SNAPSHOTS__AWS_ENDPOINT_URL` | `http://minio:9000` | S3 endpoint | ### MinIO Configuration | Variable | Default | Description | | ----------------------------- | ------------ | -------------------- | | `MINIO_VERSION` | `latest` | MinIO version | | `MINIO_ROOT_USER` | `minioadmin` | MinIO admin username | | `MINIO_ROOT_PASSWORD` | `minioadmin` | MinIO admin password | | `MINIO_API_PORT_OVERRIDE` | `9000` | MinIO API port | | `MINIO_CONSOLE_PORT_OVERRIDE` | `9001` | MinIO console port | ## Usage Examples ### Deploy a Service (to any node) ```bash # Deploy to node 1 curl -X POST http://localhost:9070/deployments \ -H 'Content-Type: application/json' \ -d '{"uri": "http://my-service:9080"}' ``` ### Invoke Service with Load Balancing Use a load balancer or DNS round-robin across ingress endpoints: ```bash # Node 1 curl -X POST http://localhost:8080/MyService/myMethod \ -H 'Content-Type: application/json' \ -d '{"key": "value"}' # Node 2 curl -X POST http://localhost:28080/MyService/myMethod \ -H 'Content-Type: application/json' \ -d '{"key": "value"}' ``` ### Check Cluster Status ```bash # From any admin API curl http://localhost:9070/cluster ``` ## Fault Tolerance The cluster is configured with: - **Replication Factor**: 2 (data written to 2 out of 3 nodes) - **Quorum**: Requires majority (2/3 nodes) for metadata operations - **Tolerance**: Can survive 1 node failure without downtime ### Testing Failover Stop one node: ```bash docker compose stop restate-2 ``` The cluster continues to operate. Services remain available on nodes 1 and 3. ## Snapshots and Backups Partition snapshots are automatically saved to MinIO every 1000 records. This enables: - Fast recovery after failures - Backup and restore capabilities - Reduced replay time on node restart ### Viewing Snapshots Access MinIO console at `http://localhost:9001`: 1. Login with `minioadmin` / `minioadmin` 2. Navigate to `restate` bucket 3. Browse `snapshots/` directory ### Backup Strategy To backup cluster data: 1. Stop the cluster: ```bash docker compose down ``` 2. Backup volumes: ```bash docker run --rm -v restate-cluster_restate_data:/data -v $(pwd)/backup:/backup alpine tar czf /backup/restate-data.tar.gz -C /data . docker run --rm -v restate-cluster_minio_data:/data -v $(pwd)/backup:/backup alpine tar czf /backup/minio-data.tar.gz -C /data . ``` ## Resource Limits ### Per Restate Node - CPU: 0.5-2.0 cores - Memory: 512MB-2GB ### MinIO Instance - CPU: 0.25-1.0 cores - Memory: 128MB-512MB Adjust based on workload in `docker-compose.yaml`. ## Scaling To add more nodes: 1. Add new service in `docker-compose.yaml` 2. Set unique `RESTATE_NODE_NAME` and `RESTATE_FORCE_NODE_ID` 3. Add node address to `RESTATE_METADATA_CLIENT__ADDRESSES` 4. Expose unique ports 5. Set `RESTATE_AUTO_PROVISION=false` ## Production Considerations - **Storage**: Use durable storage for volumes (EBS, persistent disks) - **Network**: Ensure low latency between nodes (<10ms) - **Monitoring**: Set up Prometheus scraping on port 9070 - **Security**: Change MinIO credentials in production - **Replication**: Adjust `RESTATE_DEFAULT_REPLICATION` based on cluster size - **Snapshots**: Consider external S3 for snapshot storage ## Monitoring Each node exposes metrics: ```bash curl http://localhost:9070/metrics # Node 1 curl http://localhost:29070/metrics # Node 2 curl http://localhost:39070/metrics # Node 3 ``` ## Troubleshooting ### Node Won't Start Check logs: ```bash docker compose logs restate-1 ``` Ensure all nodes can reach each other on port 5122. ### Split Brain Prevention The cluster uses raft consensus with majority quorum. If nodes become partitioned, only the partition with majority (2+ nodes) remains active. ### Data Recovery If data is corrupted: 1. Stop cluster 2. Restore from volume backups 3. Restart cluster ## Documentation - [Official Documentation](https://docs.restate.dev/) - [Cluster Deployment Guide](https://docs.restate.dev/server/clusters) - [Snapshots Documentation](https://docs.restate.dev/server/snapshots) - [Configuration Reference](https://docs.restate.dev/references/server-config) ## License This configuration is provided under the project's license. Restate is licensed under the [Business Source License 1.1](https://github.com/restatedev/restate/blob/main/LICENSE). ## Notes - For single-node deployments, see [restate](../restate/) - MinIO is used for demo purposes; use AWS S3/compatible storage in production - The cluster automatically provisions on first start - Node IDs are pinned to ensure consistent identity across restarts