feat: add more otel services

2026-01-11 23:42:34 +08:00
parent ea5eadfcec
commit 51fd7ea08b
28 changed files with 2358 additions and 70 deletions
--- a/src/tempo/README.md
+++ b/src/tempo/README.md
@@ -0,0 +1,211 @@
+# Grafana Tempo
+
+[中文文档](README.zh.md)
+
+Grafana Tempo is an open-source, easy-to-use, and high-scale distributed tracing backend. Tempo is cost-efficient, requiring only object storage to operate, and is deeply integrated with Grafana, Prometheus, and Loki.
+
+## Features
+
+- **Cost-effective**: Uses object storage (supports S3, GCS, Azure, filesystem)
+- **Easy to operate**: No dependencies other than object storage
+- **Multi-tenant**: Built-in multi-tenancy support
+- **Multiple protocols**: Supports OTLP, Jaeger, and Zipkin
+- **TraceQL**: Powerful query language for trace data
+- **Metrics generation**: Can generate RED metrics from traces
+
+## Quick Start
+
+1. Copy the example environment file:
+
+    ```bash
+    cp .env.example .env
+    ```
+
+2. Start the service:
+
+    ```bash
+    docker compose up -d
+    ```
+
+3. Verify the service is running:
+
+    ```bash
+    docker compose ps
+    curl http://localhost:3200/ready
+    ```
+
+## Configuration
+
+### Environment Variables
+
+| Variable                                 | Default | Description             |
+| ---------------------------------------- | ------- | ----------------------- |
+| `TEMPO_VERSION`                          | `2.7.2` | Tempo version           |
+| `TEMPO_HTTP_PORT_OVERRIDE`               | `3200`  | HTTP API port           |
+| `TEMPO_GRPC_PORT_OVERRIDE`               | `9095`  | gRPC port               |
+| `TEMPO_OTLP_HTTP_PORT_OVERRIDE`          | `4318`  | OTLP HTTP receiver port |
+| `TEMPO_OTLP_GRPC_PORT_OVERRIDE`          | `4317`  | OTLP gRPC receiver port |
+| `TEMPO_ZIPKIN_PORT_OVERRIDE`             | `9411`  | Zipkin receiver port    |
+| `TEMPO_JAEGER_THRIFT_HTTP_PORT_OVERRIDE` | `14268` | Jaeger Thrift HTTP port |
+| `TEMPO_JAEGER_GRPC_PORT_OVERRIDE`        | `14250` | Jaeger gRPC port        |
+| `TZ`                                     | `UTC`   | Timezone                |
+| `TEMPO_CPU_LIMIT`                        | `1.0`   | CPU limit               |
+| `TEMPO_MEMORY_LIMIT`                     | `1G`    | Memory limit            |
+| `TEMPO_CPU_RESERVATION`                  | `0.25`  | CPU reservation         |
+| `TEMPO_MEMORY_RESERVATION`               | `256M`  | Memory reservation      |
+
+### Supported Trace Protocols
+
+- **OTLP** (OpenTelemetry Protocol): Port 4317 (gRPC), 4318 (HTTP)
+- **Zipkin**: Port 9411
+- **Jaeger**: Port 14250 (gRPC), 14268 (Thrift HTTP)
+
+### Default Configuration
+
+The service includes a basic configuration file (`tempo-config.yaml`) that:
+
+- Enables all major trace receivers (OTLP, Jaeger, Zipkin)
+- Uses local filesystem storage
+- Configures trace retention and compaction
+- Enables metrics generation from traces (requires Prometheus)
+
+For production deployments, you should customize the configuration based on your requirements.
+
+## Integration with Grafana
+
+1. Add Tempo as a data source in Grafana:
+   - URL: `http://tempo:3200` (if running in the same Docker network)
+   - Or: `http://localhost:3200` (from host machine)
+
+2. Query traces using TraceQL or trace IDs
+
+3. Enable trace-to-logs and trace-to-metrics correlation
+
+## Sending Traces to Tempo
+
+### OpenTelemetry SDK
+
+Configure your application to send traces to Tempo:
+
+```python
+from opentelemetry import trace
+from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
+from opentelemetry.sdk.trace import TracerProvider
+from opentelemetry.sdk.trace.export import BatchSpanProcessor
+
+# Configure the OTLP exporter
+otlp_exporter = OTLPSpanExporter(
+    endpoint="http://localhost:4317",
+    insecure=True
+)
+
+# Set up the tracer provider
+trace.set_tracer_provider(TracerProvider())
+trace.get_tracer_provider().add_span_processor(
+    BatchSpanProcessor(otlp_exporter)
+)
+```
+
+### Using cURL (Testing)
+
+Send a test trace via HTTP:
+
+```bash
+curl -X POST http://localhost:4318/v1/traces \
+  -H "Content-Type: application/json" \
+  -d '{
+    "resourceSpans": [{
+      "resource": {
+        "attributes": [{
+          "key": "service.name",
+          "value": {"stringValue": "test-service"}
+        }]
+      },
+      "scopeSpans": [{
+        "spans": [{
+          "traceId": "5B8EFFF798038103D269B633813FC60C",
+          "spanId": "EEE19B7EC3C1B174",
+          "name": "test-span",
+          "startTimeUnixNano": "1544712660000000000",
+          "endTimeUnixNano": "1544712661000000000",
+          "kind": 1
+        }]
+      }]
+    }]
+  }'
+```
+
+### Jaeger Client Libraries
+
+Configure Jaeger clients to send to Tempo's Jaeger-compatible endpoints:
+
+```yaml
+JAEGER_AGENT_HOST: localhost
+JAEGER_AGENT_PORT: 14250
+```
+
+## Storage
+
+Traces are stored in a Docker volume named `tempo_data`.
+
+## Metrics Generation
+
+Tempo can generate RED (Rate, Errors, Duration) metrics from traces. The default configuration attempts to send these to Prometheus at `http://prometheus:9090`. If you don't have Prometheus running, you can:
+
+1. Remove the `remote_write` section from `tempo-config.yaml`
+2. Set up Prometheus to receive metrics from Tempo
+
+## Health Check
+
+The service includes a health check that monitors the `/ready` endpoint every 30 seconds.
+
+## Resource Requirements
+
+- **Minimum**: 256MB RAM, 0.25 CPU
+- **Recommended**: 1GB RAM, 1 CPU (for moderate trace volumes)
+- **Production**: Scale based on trace ingestion rate and retention period
+
+## Security Considerations
+
+The default configuration:
+
+- Runs as non-root user (UID:GID 10001:10001)
+- Exposes multiple ports for different protocols
+- Uses filesystem storage (not suitable for distributed deployments)
+
+For production:
+
+- Use object storage (S3, GCS, Azure Blob)
+- Enable authentication and encryption
+- Implement proper network security and access controls
+- Configure appropriate retention policies
+- Consider running in distributed mode for high availability
+
+## TraceQL Examples
+
+Query traces using TraceQL in Grafana:
+
+```traceql
+# Find slow traces
+{ duration > 1s }
+
+# Find traces with errors
+{ status = error }
+
+# Find traces for a specific service
+{ resource.service.name = "frontend" }
+
+# Complex query
+{ resource.service.name = "frontend" && duration > 100ms && status = error }
+```
+
+## Documentation
+
+- [Official Documentation](https://grafana.com/docs/tempo/latest/)
+- [TraceQL Query Language](https://grafana.com/docs/tempo/latest/traceql/)
+- [Configuration Reference](https://grafana.com/docs/tempo/latest/configuration/)
+- [GitHub Repository](https://github.com/grafana/tempo)
+
+## License
+
+Tempo is licensed under the [AGPLv3 License](https://github.com/grafana/tempo/blob/main/LICENSE).