feat: add restate
This commit is contained in:
280
src/restate-cluster/README.md
Normal file
280
src/restate-cluster/README.md
Normal file
@@ -0,0 +1,280 @@
|
||||
# Restate Multi-Node Cluster
|
||||
|
||||
A highly-available 3-node Restate cluster for production workloads. This configuration provides fault tolerance and automatic failover capabilities.
|
||||
|
||||
## Features
|
||||
|
||||
- **High Availability**: 3-node cluster can tolerate 1 node failure
|
||||
- **Data Replication**: Configurable replication factor (default: 2 of 3 nodes)
|
||||
- **Automatic Snapshots**: Periodic snapshots stored in MinIO (S3-compatible)
|
||||
- **Load Distribution**: Multiple ingress endpoints for load balancing
|
||||
- **Metadata Quorum**: Replicated metadata cluster for consistency
|
||||
|
||||
## Quick Start
|
||||
|
||||
1. Copy environment file:
|
||||
|
||||
```bash
|
||||
cp .env.example .env
|
||||
```
|
||||
|
||||
2. Start the cluster:
|
||||
|
||||
```bash
|
||||
docker compose up -d
|
||||
```
|
||||
|
||||
3. Verify cluster health:
|
||||
|
||||
```bash
|
||||
# Check node 1
|
||||
curl http://localhost:9070/health
|
||||
|
||||
# Check node 2
|
||||
curl http://localhost:29070/health
|
||||
|
||||
# Check node 3
|
||||
curl http://localhost:39070/health
|
||||
```
|
||||
|
||||
## Architecture
|
||||
|
||||
The cluster consists of:
|
||||
|
||||
- **3 Restate Nodes**: Distributed workflow engines with replicated state
|
||||
- **MinIO**: S3-compatible storage for partition snapshots
|
||||
- **Replicated Bifrost Provider**: Log replication across nodes
|
||||
- **Metadata Cluster**: Distributed metadata server on all nodes
|
||||
|
||||
## Service Endpoints
|
||||
|
||||
### Node 1 (Primary)
|
||||
|
||||
- Ingress API: `http://localhost:8080`
|
||||
- Admin API: `http://localhost:9070`
|
||||
- Node Communication: Port 5122
|
||||
|
||||
### Node 2
|
||||
|
||||
- Ingress API: `http://localhost:28080`
|
||||
- Admin API: `http://localhost:29070`
|
||||
- Node Communication: Port 25122
|
||||
|
||||
### Node 3
|
||||
|
||||
- Ingress API: `http://localhost:38080`
|
||||
- Admin API: `http://localhost:39070`
|
||||
- Node Communication: Port 35122
|
||||
|
||||
### MinIO
|
||||
|
||||
- API: `http://localhost:9000`
|
||||
- Console: `http://localhost:9001` (admin UI)
|
||||
- Username: `minioadmin`
|
||||
- Password: `minioadmin`
|
||||
|
||||
## Environment Variables
|
||||
|
||||
### Cluster Configuration
|
||||
|
||||
| Variable | Default | Description |
|
||||
| ----------------------------- | ----------------- | --------------------------- |
|
||||
| `RESTATE_VERSION` | `1.5.3` | Restate server version |
|
||||
| `RESTATE_CLUSTER_NAME` | `restate-cluster` | Cluster name |
|
||||
| `RESTATE_DEFAULT_REPLICATION` | `2` | Minimum replicas for writes |
|
||||
|
||||
### Port Configuration
|
||||
|
||||
Each node has three ports:
|
||||
|
||||
- `NODEx_INGRESS_PORT_OVERRIDE`: Ingress API port
|
||||
- `NODEx_ADMIN_PORT_OVERRIDE`: Admin API port
|
||||
- `NODEx_NODE_PORT_OVERRIDE`: Node-to-node communication port
|
||||
|
||||
### Snapshot Configuration
|
||||
|
||||
| Variable | Default | Description |
|
||||
| ---------------------------------------------------------- | ------------------------ | -------------------- |
|
||||
| `RESTATE_WORKER__SNAPSHOTS__DESTINATION` | `s3://restate/snapshots` | S3 bucket path |
|
||||
| `RESTATE_WORKER__SNAPSHOTS__SNAPSHOT_INTERVAL_NUM_RECORDS` | `1000` | Records per snapshot |
|
||||
| `RESTATE_WORKER__SNAPSHOTS__AWS_ENDPOINT_URL` | `http://minio:9000` | S3 endpoint |
|
||||
|
||||
### MinIO Configuration
|
||||
|
||||
| Variable | Default | Description |
|
||||
| ----------------------------- | ------------ | -------------------- |
|
||||
| `MINIO_VERSION` | `latest` | MinIO version |
|
||||
| `MINIO_ROOT_USER` | `minioadmin` | MinIO admin username |
|
||||
| `MINIO_ROOT_PASSWORD` | `minioadmin` | MinIO admin password |
|
||||
| `MINIO_API_PORT_OVERRIDE` | `9000` | MinIO API port |
|
||||
| `MINIO_CONSOLE_PORT_OVERRIDE` | `9001` | MinIO console port |
|
||||
|
||||
## Usage Examples
|
||||
|
||||
### Deploy a Service (to any node)
|
||||
|
||||
```bash
|
||||
# Deploy to node 1
|
||||
curl -X POST http://localhost:9070/deployments \
|
||||
-H 'Content-Type: application/json' \
|
||||
-d '{"uri": "http://my-service:9080"}'
|
||||
```
|
||||
|
||||
### Invoke Service with Load Balancing
|
||||
|
||||
Use a load balancer or DNS round-robin across ingress endpoints:
|
||||
|
||||
```bash
|
||||
# Node 1
|
||||
curl -X POST http://localhost:8080/MyService/myMethod \
|
||||
-H 'Content-Type: application/json' \
|
||||
-d '{"key": "value"}'
|
||||
|
||||
# Node 2
|
||||
curl -X POST http://localhost:28080/MyService/myMethod \
|
||||
-H 'Content-Type: application/json' \
|
||||
-d '{"key": "value"}'
|
||||
```
|
||||
|
||||
### Check Cluster Status
|
||||
|
||||
```bash
|
||||
# From any admin API
|
||||
curl http://localhost:9070/cluster
|
||||
```
|
||||
|
||||
## Fault Tolerance
|
||||
|
||||
The cluster is configured with:
|
||||
|
||||
- **Replication Factor**: 2 (data written to 2 out of 3 nodes)
|
||||
- **Quorum**: Requires majority (2/3 nodes) for metadata operations
|
||||
- **Tolerance**: Can survive 1 node failure without downtime
|
||||
|
||||
### Testing Failover
|
||||
|
||||
Stop one node:
|
||||
|
||||
```bash
|
||||
docker compose stop restate-2
|
||||
```
|
||||
|
||||
The cluster continues to operate. Services remain available on nodes 1 and 3.
|
||||
|
||||
## Snapshots and Backups
|
||||
|
||||
Partition snapshots are automatically saved to MinIO every 1000 records. This enables:
|
||||
|
||||
- Fast recovery after failures
|
||||
- Backup and restore capabilities
|
||||
- Reduced replay time on node restart
|
||||
|
||||
### Viewing Snapshots
|
||||
|
||||
Access MinIO console at `http://localhost:9001`:
|
||||
|
||||
1. Login with `minioadmin` / `minioadmin`
|
||||
2. Navigate to `restate` bucket
|
||||
3. Browse `snapshots/` directory
|
||||
|
||||
### Backup Strategy
|
||||
|
||||
To backup cluster data:
|
||||
|
||||
1. Stop the cluster:
|
||||
|
||||
```bash
|
||||
docker compose down
|
||||
```
|
||||
|
||||
2. Backup volumes:
|
||||
|
||||
```bash
|
||||
docker run --rm -v restate-cluster_restate_data:/data -v $(pwd)/backup:/backup alpine tar czf /backup/restate-data.tar.gz -C /data .
|
||||
docker run --rm -v restate-cluster_minio_data:/data -v $(pwd)/backup:/backup alpine tar czf /backup/minio-data.tar.gz -C /data .
|
||||
```
|
||||
|
||||
## Resource Limits
|
||||
|
||||
### Per Restate Node
|
||||
|
||||
- CPU: 0.5-2.0 cores
|
||||
- Memory: 512MB-2GB
|
||||
|
||||
### MinIO Instance
|
||||
|
||||
- CPU: 0.25-1.0 cores
|
||||
- Memory: 128MB-512MB
|
||||
|
||||
Adjust based on workload in `docker-compose.yaml`.
|
||||
|
||||
## Scaling
|
||||
|
||||
To add more nodes:
|
||||
|
||||
1. Add new service in `docker-compose.yaml`
|
||||
2. Set unique `RESTATE_NODE_NAME` and `RESTATE_FORCE_NODE_ID`
|
||||
3. Add node address to `RESTATE_METADATA_CLIENT__ADDRESSES`
|
||||
4. Expose unique ports
|
||||
5. Set `RESTATE_AUTO_PROVISION=false`
|
||||
|
||||
## Production Considerations
|
||||
|
||||
- **Storage**: Use durable storage for volumes (EBS, persistent disks)
|
||||
- **Network**: Ensure low latency between nodes (<10ms)
|
||||
- **Monitoring**: Set up Prometheus scraping on port 9070
|
||||
- **Security**: Change MinIO credentials in production
|
||||
- **Replication**: Adjust `RESTATE_DEFAULT_REPLICATION` based on cluster size
|
||||
- **Snapshots**: Consider external S3 for snapshot storage
|
||||
|
||||
## Monitoring
|
||||
|
||||
Each node exposes metrics:
|
||||
|
||||
```bash
|
||||
curl http://localhost:9070/metrics # Node 1
|
||||
curl http://localhost:29070/metrics # Node 2
|
||||
curl http://localhost:39070/metrics # Node 3
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Node Won't Start
|
||||
|
||||
Check logs:
|
||||
|
||||
```bash
|
||||
docker compose logs restate-1
|
||||
```
|
||||
|
||||
Ensure all nodes can reach each other on port 5122.
|
||||
|
||||
### Split Brain Prevention
|
||||
|
||||
The cluster uses raft consensus with majority quorum. If nodes become partitioned, only the partition with majority (2+ nodes) remains active.
|
||||
|
||||
### Data Recovery
|
||||
|
||||
If data is corrupted:
|
||||
|
||||
1. Stop cluster
|
||||
2. Restore from volume backups
|
||||
3. Restart cluster
|
||||
|
||||
## Documentation
|
||||
|
||||
- [Official Documentation](https://docs.restate.dev/)
|
||||
- [Cluster Deployment Guide](https://docs.restate.dev/server/clusters)
|
||||
- [Snapshots Documentation](https://docs.restate.dev/server/snapshots)
|
||||
- [Configuration Reference](https://docs.restate.dev/references/server-config)
|
||||
|
||||
## License
|
||||
|
||||
This configuration is provided under the project's license. Restate is licensed under the [Business Source License 1.1](https://github.com/restatedev/restate/blob/main/LICENSE).
|
||||
|
||||
## Notes
|
||||
|
||||
- For single-node deployments, see [restate](../restate/)
|
||||
- MinIO is used for demo purposes; use AWS S3/compatible storage in production
|
||||
- The cluster automatically provisions on first start
|
||||
- Node IDs are pinned to ensure consistent identity across restarts
|
||||
Reference in New Issue
Block a user