5.6 KiB
TurboOCR — Custom Builds
This directory builds TurboOCR from source for two targets that are not covered by the upstream pre-built images:
| Variant | Dockerfile | Profile | Base image |
|---|---|---|---|
| CUDA 12.x | Dockerfile.cuda12 |
gpu |
nvcr.io/nvidia/tensorrt:24.12-py3 (TRT 10.8 / CUDA 12.7) |
| CPU-only | Dockerfile.cpu |
cpu |
ubuntu:24.04 (ONNX Runtime) |
The upstream pre-built image targets CUDA 13.x (Blackwell / CC 12.0). Use this directory if your GPU is on CUDA 12.x (Turing through Ada Lovelace, CC 7.5–8.9) or if you have no GPU at all.
Quick Start
-
Copy the example environment file:
cp .env.example .env -
Build and start the variant you need:
CUDA 12.x (GPU — Turing through Ada Lovelace):
docker compose --profile gpu up -d --buildCPU-only (no GPU required):
docker compose --profile cpu up -d --build -
Access the API at http://localhost:8000.
Note: The first build compiles Drogon and TurboOCR from source, which takes 10–30 minutes depending on your CPU core count. Subsequent builds use the Docker layer cache and are fast.
First-Start Behavior
GPU variant
On the very first container start, TensorRT compiles 4 ONNX models into engine files. Measured times on an RTX 3070 Laptop:
| Engine | Time |
|---|---|
| det | ~5 min |
| rec | ~30 min |
| cls | ~4 min |
| layout | ~28 min |
| Total | ~67–90 min |
High-end desktop GPUs finish in ~15 minutes. The container shows unhealthy during compilation — this is expected. Once all engines are ready the server starts and the status transitions to healthy. Subsequent restarts reuse the cached engines and start in seconds.
Tip: Set
TURBOOCR_DISABLE_LAYOUT=1to skip the layout detection engine (~28 min savings on laptop GPUs). Use this only if you do not need the?layout=1PDF endpoint.
CPU variant
No TRT compilation occurs. ONNX Runtime loads the models directly at startup. The container is typically healthy within 60 seconds.
Default Ports
| Port | Protocol | Description |
|---|---|---|
| 8000 | HTTP | OCR REST API + health/metrics |
| 50051 | gRPC | OCR gRPC API |
Important Environment Variables
| Variable | Description | Default |
|---|---|---|
TURBOOCR_VERSION |
Git tag used for the source build | v2.1.1 |
TURBOOCR_HTTP_PORT_OVERRIDE |
Host port for the HTTP API | 8000 |
TURBOOCR_GRPC_PORT_OVERRIDE |
Host port for the gRPC API | 50051 |
TURBOOCR_LANG |
Language bundle: latin, chinese, greek, eslav, arabic, korean, thai |
"" (latin) |
TURBOOCR_SERVER |
With chinese, set to 1 for the 84 MB server rec model |
"" |
TURBOOCR_PIPELINE_POOL_SIZE |
Concurrent GPU pipelines (~1.4 GB VRAM each); empty = auto | "" |
TURBOOCR_DISABLE_LAYOUT |
Disable layout detection model (saves ~300–500 MB VRAM) | 0 |
TURBOOCR_PDF_MODE |
PDF parsing mode: ocr / geometric / auto / auto_verified |
ocr |
TURBOOCR_CPU_LIMIT |
CPU core limit (both variants) | 8.0 |
TURBOOCR_MEMORY_LIMIT |
Memory limit — 12G for GPU, 4G for CPU |
variant default |
TURBOOCR_GPU_COUNT |
NVIDIA GPUs to reserve (GPU variant only) | 1 |
TURBOOCR_SHM_SIZE |
Shared memory for fastpdf2png — 2g for GPU, 512m for CPU |
variant default |
TZ |
Container timezone | UTC |
Storage
turboocr_build_cache— named volume at/home/ocr/.cache/turbo-ocr. Stores TRT engine files (GPU) or the model cache directory (CPU). Must be a named volume — a bind-mount of an empty host directory would shadow the baked-in language bundles and the server would fail to load models.
Supported GPU Architectures (CUDA 12.x variant)
| Compute Capability | Architecture | GPUs |
|---|---|---|
| 7.5 | Turing | GTX 16xx, RTX 20xx |
| 8.0 | Ampere | A100, RTX 30xx (server) |
| 8.6 | Ampere | RTX 30xx (desktop / laptop) |
| 8.9 | Ada Lovelace | RTX 40xx |
Blackwell (CC 12.0, RTX 50xx) requires CUDA 13.x — use the upstream pre-built image from src/turboocr instead.
Notes
- Both Dockerfiles build TurboOCR from source via
git cloneinside the image. A working internet connection is required at build time. - The CUDA 12.x Dockerfile overrides
CMAKE_CUDA_ARCHITECTURESto75;80;86;89, removing CC 12.0 which is not supported by CUDA 12.x. - TensorRT 10.8 is located at
/usr/local/tensorrtin the24.12-py3base image, which matches the CMake default. No-DTENSORRT_DIRoverride is needed. - The CPU variant uses ONNX Runtime 1.22.0 and produces a
paddle_cpu_serverbinary with both HTTP and gRPC interfaces.
Endpoints
- HTTP API: http://localhost:8000
- gRPC API:
localhost:50051 - Health: http://localhost:8000/health
- Readiness: http://localhost:8000/health/ready
- Metrics (Prometheus): http://localhost:8000/metrics
Security Notes
- The API has no authentication by default. Put a reverse proxy (nginx, Caddy) in front for production.
- The default PDF mode is
ocr, which only trusts pixel data and is safe for untrusted PDF uploads. - Do not set
TURBOOCR_PDF_MODEtogeometricorautoglobally if you accept PDFs from untrusted sources.