compose-anything/builds/turboocr/README.md

# TurboOCR — Custom Builds

[中文文档](README.zh.md)

This directory builds [TurboOCR](https://github.com/aiptimizer/TurboOCR) from source for two targets that are not covered by the upstream pre-built images:

| Variant | Dockerfile | Profile | Base image |
| ------- | ---------- | ------- | ---------- |
| **CUDA 12.x** | `Dockerfile.cuda12` | `gpu` | `nvcr.io/nvidia/tensorrt:24.12-py3` (TRT 10.8 / CUDA 12.7) |
| **CPU-only** | `Dockerfile.cpu` | `cpu` | `ubuntu:24.04` (ONNX Runtime) |

The upstream pre-built image targets CUDA 13.x (Blackwell / CC 12.0). Use this directory if your GPU is on CUDA 12.x (Turing through Ada Lovelace, CC 7.5–8.9) or if you have no GPU at all.

## Quick Start

1. Copy the example environment file:

   ```bash
   cp .env.example .env
   ```

2. Build and start the variant you need:

   **CUDA 12.x (GPU — Turing through Ada Lovelace):**

   ```bash
   docker compose --profile gpu up -d --build
   ```

   **CPU-only (no GPU required):**

   ```bash
   docker compose --profile cpu up -d --build
   ```

3. Access the API at <http://localhost:8000>.

> **Note:** The first build compiles Drogon and TurboOCR from source, which takes 10–30 minutes depending on your CPU core count. Subsequent builds use the Docker layer cache and are fast.

## First-Start Behavior

### GPU variant

On the very first container start, TensorRT compiles 4 ONNX models into engine files. Measured times on an RTX 3070 Laptop:

| Engine | Time |
| ------ | ---- |
| det | ~5 min |
| rec | ~30 min |
| cls | ~4 min |
| layout | ~28 min |
| **Total** | **~67–90 min** |

High-end desktop GPUs finish in ~15 minutes. The container shows `unhealthy` during compilation — this is expected. Once all engines are ready the server starts and the status transitions to `healthy`. Subsequent restarts reuse the cached engines and start in seconds.

> **Tip:** Set `TURBOOCR_DISABLE_LAYOUT=1` to skip the layout detection engine (~28 min savings on laptop GPUs). Use this only if you do not need the `?layout=1` PDF endpoint.

### CPU variant

No TRT compilation occurs. ONNX Runtime loads the models directly at startup. The container is typically `healthy` within 60 seconds.

## Default Ports

| Port | Protocol | Description |
| ---- | -------- | ----------- |
| 8000 | HTTP | OCR REST API + health/metrics |
| 50051 | gRPC | OCR gRPC API |

## Important Environment Variables

| Variable | Description | Default |
| -------- | ----------- | ------- |
| `TURBOOCR_VERSION` | Git tag used for the source build | `v2.1.1` |
| `TURBOOCR_HTTP_PORT_OVERRIDE` | Host port for the HTTP API | `8000` |
| `TURBOOCR_GRPC_PORT_OVERRIDE` | Host port for the gRPC API | `50051` |
| `TURBOOCR_LANG` | Language bundle: `latin`, `chinese`, `greek`, `eslav`, `arabic`, `korean`, `thai` | `""` (latin) |
| `TURBOOCR_SERVER` | With `chinese`, set to `1` for the 84 MB server rec model | `""` |
| `TURBOOCR_PIPELINE_POOL_SIZE` | Concurrent GPU pipelines (~1.4 GB VRAM each); empty = auto | `""` |
| `TURBOOCR_DISABLE_LAYOUT` | Disable layout detection model (saves ~300–500 MB VRAM) | `0` |
| `TURBOOCR_PDF_MODE` | PDF parsing mode: `ocr` / `geometric` / `auto` / `auto_verified` | `ocr` |
| `TURBOOCR_CPU_LIMIT` | CPU core limit (both variants) | `8.0` |
| `TURBOOCR_MEMORY_LIMIT` | Memory limit — `12G` for GPU, `4G` for CPU | variant default |
| `TURBOOCR_GPU_COUNT` | NVIDIA GPUs to reserve (GPU variant only) | `1` |
| `TURBOOCR_SHM_SIZE` | Shared memory for fastpdf2png — `2g` for GPU, `512m` for CPU | variant default |
| `TZ` | Container timezone | `UTC` |

## Storage

- `turboocr_build_cache` — named volume at `/home/ocr/.cache/turbo-ocr`. Stores TRT engine files (GPU) or the model cache directory (CPU). Must be a named volume — a bind-mount of an empty host directory would shadow the baked-in language bundles and the server would fail to load models.

## Supported GPU Architectures (CUDA 12.x variant)

| Compute Capability | Architecture | GPUs |
| ------------------ | ------------ | ---- |
| 7.5 | Turing | GTX 16xx, RTX 20xx |
| 8.0 | Ampere | A100, RTX 30xx (server) |
| 8.6 | Ampere | RTX 30xx (desktop / laptop) |
| 8.9 | Ada Lovelace | RTX 40xx |

Blackwell (CC 12.0, RTX 50xx) requires CUDA 13.x — use the upstream pre-built image from `src/turboocr` instead.

## Notes

- Both Dockerfiles build TurboOCR from source via `git clone` inside the image. A working internet connection is required at build time.
- The CUDA 12.x Dockerfile overrides `CMAKE_CUDA_ARCHITECTURES` to `75;80;86;89`, removing CC 12.0 which is not supported by CUDA 12.x.
- TensorRT 10.8 is located at `/usr/local/tensorrt` in the `24.12-py3` base image, which matches the CMake default. No `-DTENSORRT_DIR` override is needed.
- The CPU variant uses ONNX Runtime 1.22.0 and produces a `paddle_cpu_server` binary with both HTTP and gRPC interfaces.

## Endpoints

- HTTP API: <http://localhost:8000>
- gRPC API: `localhost:50051`
- Health: <http://localhost:8000/health>
- Readiness: <http://localhost:8000/health/ready>
- Metrics (Prometheus): <http://localhost:8000/metrics>

## Security Notes

- The API has no authentication by default. Put a reverse proxy (nginx, Caddy) in front for production.
- The default PDF mode is `ocr`, which only trusts pixel data and is safe for untrusted PDF uploads.
- Do **not** set `TURBOOCR_PDF_MODE` to `geometric` or `auto` globally if you accept PDFs from untrusted sources.

## References

- [TurboOCR Repository](https://github.com/aiptimizer/TurboOCR)
- [NVIDIA TensorRT Container Releases](https://docs.nvidia.com/deeplearning/tensorrt/container-release-notes/)
- [NVIDIA CUDA GPU Compute Capability Table](https://developer.nvidia.com/cuda-gpus)