Files
compose-anything/builds/turboocr/README.md
T
2026-04-29 11:54:59 +08:00

5.6 KiB
Raw Blame History

TurboOCR — Custom Builds

中文文档

This directory builds TurboOCR from source for two targets that are not covered by the upstream pre-built images:

Variant Dockerfile Profile Base image
CUDA 12.x Dockerfile.cuda12 gpu nvcr.io/nvidia/tensorrt:24.12-py3 (TRT 10.8 / CUDA 12.7)
CPU-only Dockerfile.cpu cpu ubuntu:24.04 (ONNX Runtime)

The upstream pre-built image targets CUDA 13.x (Blackwell / CC 12.0). Use this directory if your GPU is on CUDA 12.x (Turing through Ada Lovelace, CC 7.58.9) or if you have no GPU at all.

Quick Start

  1. Copy the example environment file:

    cp .env.example .env
    
  2. Build and start the variant you need:

    CUDA 12.x (GPU — Turing through Ada Lovelace):

    docker compose --profile gpu up -d --build
    

    CPU-only (no GPU required):

    docker compose --profile cpu up -d --build
    
  3. Access the API at http://localhost:8000.

Note: The first build compiles Drogon and TurboOCR from source, which takes 1030 minutes depending on your CPU core count. Subsequent builds use the Docker layer cache and are fast.

First-Start Behavior

GPU variant

On the very first container start, TensorRT compiles 4 ONNX models into engine files. Measured times on an RTX 3070 Laptop:

Engine Time
det ~5 min
rec ~30 min
cls ~4 min
layout ~28 min
Total ~6790 min

High-end desktop GPUs finish in ~15 minutes. The container shows unhealthy during compilation — this is expected. Once all engines are ready the server starts and the status transitions to healthy. Subsequent restarts reuse the cached engines and start in seconds.

Tip: Set TURBOOCR_DISABLE_LAYOUT=1 to skip the layout detection engine (~28 min savings on laptop GPUs). Use this only if you do not need the ?layout=1 PDF endpoint.

CPU variant

No TRT compilation occurs. ONNX Runtime loads the models directly at startup. The container is typically healthy within 60 seconds.

Default Ports

Port Protocol Description
8000 HTTP OCR REST API + health/metrics
50051 gRPC OCR gRPC API

Important Environment Variables

Variable Description Default
TURBOOCR_VERSION Git tag used for the source build v2.1.1
TURBOOCR_HTTP_PORT_OVERRIDE Host port for the HTTP API 8000
TURBOOCR_GRPC_PORT_OVERRIDE Host port for the gRPC API 50051
TURBOOCR_LANG Language bundle: latin, chinese, greek, eslav, arabic, korean, thai "" (latin)
TURBOOCR_SERVER With chinese, set to 1 for the 84 MB server rec model ""
TURBOOCR_PIPELINE_POOL_SIZE Concurrent GPU pipelines (~1.4 GB VRAM each); empty = auto ""
TURBOOCR_DISABLE_LAYOUT Disable layout detection model (saves ~300500 MB VRAM) 0
TURBOOCR_PDF_MODE PDF parsing mode: ocr / geometric / auto / auto_verified ocr
TURBOOCR_CPU_LIMIT CPU core limit (both variants) 8.0
TURBOOCR_MEMORY_LIMIT Memory limit — 12G for GPU, 4G for CPU variant default
TURBOOCR_GPU_COUNT NVIDIA GPUs to reserve (GPU variant only) 1
TURBOOCR_SHM_SIZE Shared memory for fastpdf2png — 2g for GPU, 512m for CPU variant default
TZ Container timezone UTC

Storage

  • turboocr_build_cache — named volume at /home/ocr/.cache/turbo-ocr. Stores TRT engine files (GPU) or the model cache directory (CPU). Must be a named volume — a bind-mount of an empty host directory would shadow the baked-in language bundles and the server would fail to load models.

Supported GPU Architectures (CUDA 12.x variant)

Compute Capability Architecture GPUs
7.5 Turing GTX 16xx, RTX 20xx
8.0 Ampere A100, RTX 30xx (server)
8.6 Ampere RTX 30xx (desktop / laptop)
8.9 Ada Lovelace RTX 40xx

Blackwell (CC 12.0, RTX 50xx) requires CUDA 13.x — use the upstream pre-built image from src/turboocr instead.

Notes

  • Both Dockerfiles build TurboOCR from source via git clone inside the image. A working internet connection is required at build time.
  • The CUDA 12.x Dockerfile overrides CMAKE_CUDA_ARCHITECTURES to 75;80;86;89, removing CC 12.0 which is not supported by CUDA 12.x.
  • TensorRT 10.8 is located at /usr/local/tensorrt in the 24.12-py3 base image, which matches the CMake default. No -DTENSORRT_DIR override is needed.
  • The CPU variant uses ONNX Runtime 1.22.0 and produces a paddle_cpu_server binary with both HTTP and gRPC interfaces.

Endpoints

Security Notes

  • The API has no authentication by default. Put a reverse proxy (nginx, Caddy) in front for production.
  • The default PDF mode is ocr, which only trusts pixel data and is safe for untrusted PDF uploads.
  • Do not set TURBOOCR_PDF_MODE to geometric or auto globally if you accept PDFs from untrusted sources.

References