Files
2026-04-29 11:54:59 +08:00

128 lines
5.6 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# TurboOCR — Custom Builds
[中文文档](README.zh.md)
This directory builds [TurboOCR](https://github.com/aiptimizer/TurboOCR) from source for two targets that are not covered by the upstream pre-built images:
| Variant | Dockerfile | Profile | Base image |
| ------- | ---------- | ------- | ---------- |
| **CUDA 12.x** | `Dockerfile.cuda12` | `gpu` | `nvcr.io/nvidia/tensorrt:24.12-py3` (TRT 10.8 / CUDA 12.7) |
| **CPU-only** | `Dockerfile.cpu` | `cpu` | `ubuntu:24.04` (ONNX Runtime) |
The upstream pre-built image targets CUDA 13.x (Blackwell / CC 12.0). Use this directory if your GPU is on CUDA 12.x (Turing through Ada Lovelace, CC 7.58.9) or if you have no GPU at all.
## Quick Start
1. Copy the example environment file:
```bash
cp .env.example .env
```
2. Build and start the variant you need:
**CUDA 12.x (GPU — Turing through Ada Lovelace):**
```bash
docker compose --profile gpu up -d --build
```
**CPU-only (no GPU required):**
```bash
docker compose --profile cpu up -d --build
```
3. Access the API at <http://localhost:8000>.
> **Note:** The first build compiles Drogon and TurboOCR from source, which takes 1030 minutes depending on your CPU core count. Subsequent builds use the Docker layer cache and are fast.
## First-Start Behavior
### GPU variant
On the very first container start, TensorRT compiles 4 ONNX models into engine files. Measured times on an RTX 3070 Laptop:
| Engine | Time |
| ------ | ---- |
| det | ~5 min |
| rec | ~30 min |
| cls | ~4 min |
| layout | ~28 min |
| **Total** | **~6790 min** |
High-end desktop GPUs finish in ~15 minutes. The container shows `unhealthy` during compilation — this is expected. Once all engines are ready the server starts and the status transitions to `healthy`. Subsequent restarts reuse the cached engines and start in seconds.
> **Tip:** Set `TURBOOCR_DISABLE_LAYOUT=1` to skip the layout detection engine (~28 min savings on laptop GPUs). Use this only if you do not need the `?layout=1` PDF endpoint.
### CPU variant
No TRT compilation occurs. ONNX Runtime loads the models directly at startup. The container is typically `healthy` within 60 seconds.
## Default Ports
| Port | Protocol | Description |
| ---- | -------- | ----------- |
| 8000 | HTTP | OCR REST API + health/metrics |
| 50051 | gRPC | OCR gRPC API |
## Important Environment Variables
| Variable | Description | Default |
| -------- | ----------- | ------- |
| `TURBOOCR_VERSION` | Git tag used for the source build | `v2.1.1` |
| `TURBOOCR_HTTP_PORT_OVERRIDE` | Host port for the HTTP API | `8000` |
| `TURBOOCR_GRPC_PORT_OVERRIDE` | Host port for the gRPC API | `50051` |
| `TURBOOCR_LANG` | Language bundle: `latin`, `chinese`, `greek`, `eslav`, `arabic`, `korean`, `thai` | `""` (latin) |
| `TURBOOCR_SERVER` | With `chinese`, set to `1` for the 84 MB server rec model | `""` |
| `TURBOOCR_PIPELINE_POOL_SIZE` | Concurrent GPU pipelines (~1.4 GB VRAM each); empty = auto | `""` |
| `TURBOOCR_DISABLE_LAYOUT` | Disable layout detection model (saves ~300500 MB VRAM) | `0` |
| `TURBOOCR_PDF_MODE` | PDF parsing mode: `ocr` / `geometric` / `auto` / `auto_verified` | `ocr` |
| `TURBOOCR_CPU_LIMIT` | CPU core limit (both variants) | `8.0` |
| `TURBOOCR_MEMORY_LIMIT` | Memory limit — `12G` for GPU, `4G` for CPU | variant default |
| `TURBOOCR_GPU_COUNT` | NVIDIA GPUs to reserve (GPU variant only) | `1` |
| `TURBOOCR_SHM_SIZE` | Shared memory for fastpdf2png — `2g` for GPU, `512m` for CPU | variant default |
| `TZ` | Container timezone | `UTC` |
## Storage
- `turboocr_build_cache` — named volume at `/home/ocr/.cache/turbo-ocr`. Stores TRT engine files (GPU) or the model cache directory (CPU). Must be a named volume — a bind-mount of an empty host directory would shadow the baked-in language bundles and the server would fail to load models.
## Supported GPU Architectures (CUDA 12.x variant)
| Compute Capability | Architecture | GPUs |
| ------------------ | ------------ | ---- |
| 7.5 | Turing | GTX 16xx, RTX 20xx |
| 8.0 | Ampere | A100, RTX 30xx (server) |
| 8.6 | Ampere | RTX 30xx (desktop / laptop) |
| 8.9 | Ada Lovelace | RTX 40xx |
Blackwell (CC 12.0, RTX 50xx) requires CUDA 13.x — use the upstream pre-built image from `src/turboocr` instead.
## Notes
- Both Dockerfiles build TurboOCR from source via `git clone` inside the image. A working internet connection is required at build time.
- The CUDA 12.x Dockerfile overrides `CMAKE_CUDA_ARCHITECTURES` to `75;80;86;89`, removing CC 12.0 which is not supported by CUDA 12.x.
- TensorRT 10.8 is located at `/usr/local/tensorrt` in the `24.12-py3` base image, which matches the CMake default. No `-DTENSORRT_DIR` override is needed.
- The CPU variant uses ONNX Runtime 1.22.0 and produces a `paddle_cpu_server` binary with both HTTP and gRPC interfaces.
## Endpoints
- HTTP API: <http://localhost:8000>
- gRPC API: `localhost:50051`
- Health: <http://localhost:8000/health>
- Readiness: <http://localhost:8000/health/ready>
- Metrics (Prometheus): <http://localhost:8000/metrics>
## Security Notes
- The API has no authentication by default. Put a reverse proxy (nginx, Caddy) in front for production.
- The default PDF mode is `ocr`, which only trusts pixel data and is safe for untrusted PDF uploads.
- Do **not** set `TURBOOCR_PDF_MODE` to `geometric` or `auto` globally if you accept PDFs from untrusted sources.
## References
- [TurboOCR Repository](https://github.com/aiptimizer/TurboOCR)
- [NVIDIA TensorRT Container Releases](https://docs.nvidia.com/deeplearning/tensorrt/container-release-notes/)
- [NVIDIA CUDA GPU Compute Capability Table](https://developer.nvidia.com/cuda-gpus)