Files
compose-anything/src/turboocr
2026-04-29 11:54:59 +08:00
..
2026-04-28 10:05:39 +08:00
2026-04-29 11:54:59 +08:00
2026-04-29 11:54:59 +08:00
2026-04-29 11:54:59 +08:00

TurboOCR

English | 中文

This service deploys TurboOCR, a GPU-accelerated OCR server built on C++ / CUDA / TensorRT / PP-OCRv5. It exposes both an HTTP API and a gRPC API from a single binary that share the same GPU pipeline pool, with Prometheus metrics built in.

Services

  • turboocr: TurboOCR HTTP (port 8000) + gRPC (port 50051) inference server

Requirements

  • Linux host with NVIDIA driver 595 or newer
  • Turing or newer GPU (RTX 20-series / GTX 16-series and up)
  • NVIDIA Container Toolkit installed and configured for Docker

Environment Variables

Variable Name Description Default Value
TURBOOCR_VERSION TurboOCR image version v2.1.1
TURBOOCR_LANG Language bundle: latin, chinese, greek, eslav, arabic, korean, thai "" (latin)
TURBOOCR_SERVER With chinese, set to 1 for the 84 MB server rec ""
TURBOOCR_PIPELINE_POOL_SIZE Concurrent GPU pipelines (~1.4 GB VRAM each); empty = auto ""
TURBOOCR_DISABLE_LAYOUT Disable layout detection model (saves ~300-500 MB VRAM) 0
TURBOOCR_PDF_MODE Default PDF mode: ocr / geometric / auto / auto_verified ocr
TURBOOCR_DISABLE_ANGLE_CLS Skip angle classifier (~0.4 ms savings) 0
TURBOOCR_DET_MAX_SIDE Max detection input size in pixels 960
TURBOOCR_PDF_DAEMONS PDF render daemons 16
TURBOOCR_PDF_WORKERS PDF worker threads 4
TURBOOCR_MAX_PDF_PAGES Maximum pages per PDF request 2000
TURBOOCR_LOG_LEVEL Log level: debug / info / warn / error info
TURBOOCR_LOG_FORMAT Log format: json / text json
TURBOOCR_HTTP_PORT_OVERRIDE Host port for HTTP API 8000
TURBOOCR_GRPC_PORT_OVERRIDE Host port for gRPC API 50051
TURBOOCR_CPU_LIMIT CPU limit 8.0
TURBOOCR_MEMORY_LIMIT Memory limit 12G
TURBOOCR_GPU_COUNT Number of NVIDIA GPUs to reserve 1
TURBOOCR_SHM_SIZE Shared memory size 2g

Copy .env.example to .env and override only the variables you need to change.

Volumes

  • turboocr_trt_cache: Caches TensorRT engines built from ONNX on first start. Must be a named volume — a bind-mount of an empty host directory would shadow the baked-in language bundles and the server would fail to load models.

Usage

Start TurboOCR

docker compose up -d

The first start builds 4 TensorRT engines from ONNX. Measured build times on an RTX 3070 Laptop: det (~5 min) + rec (~30 min) + cls (~4 min) + layout (~28 min) = ~6790 minutes total. High-end desktop GPUs finish in ~15 minutes. The container reports unhealthy while compilation is in progress — this is expected. Once all engines are built the server starts and the container transitions to healthy. Subsequent restarts reuse the cached engines and start in seconds.

Tip — faster first boot: Set TURBOOCR_DISABLE_LAYOUT=1 to skip the layout detection engine (~28 min on laptop GPUs). Only do this if you don't need the ?layout=1 PDF endpoint.

Endpoints

Test the API

# Image — raw bytes (fastest path)
curl -X POST http://localhost:8000/ocr/raw \
  --data-binary @document.png \
  -H "Content-Type: image/png"

# Image — base64 JSON
curl -X POST http://localhost:8000/ocr \
  -H "Content-Type: application/json" \
  -d '{"image":"'$(base64 -w0 document.png)'"}'

# PDF — raw bytes
curl -X POST http://localhost:8000/ocr/pdf \
  --data-binary @document.pdf

# PDF with layout detection enabled
curl -X POST "http://localhost:8000/ocr/pdf?layout=1&mode=auto" \
  --data-binary @document.pdf

Important: Use HTTP keep-alive. Sending many short-lived connections (e.g. one curl per request in a loop) can overwhelm the server. Standard HTTP client libraries (requests.Session, aiohttp, Go http.Client, etc.) reuse connections by default.

Switching Languages

Edit .env and restart:

TURBOOCR_LANG=chinese
TURBOOCR_SERVER=1   # optional: use the 84 MB Chinese server rec
docker compose up -d

All language bundles are baked into the image at build time (SHA256-verified from the pinned PP-OCRv5 release). No runtime downloads.

Performance Tuning

  • GPU pipelines — set TURBOOCR_PIPELINE_POOL_SIZE based on available VRAM (~1.4 GB each)
  • Layout overhead?layout=1 reduces throughput by ~20%; set TURBOOCR_DISABLE_LAYOUT=1 to skip loading the model entirely
  • Shared memory — increase TURBOOCR_SHM_SIZE if you process very large PDFs

Security Notes

  • The API has no authentication by default. Put a reverse proxy (nginx, Caddy) in front for production.
  • The default PDF mode is ocr, which only trusts pixel data and is safe for untrusted PDF uploads.
  • Do not set TURBOOCR_PDF_MODE to geometric or auto globally if you accept PDFs from untrusted sources — a malicious PDF can embed invisible text or remap glyphs to inject arbitrary strings into the text layer.
  • Use auto_verified for higher accuracy on trusted documents; it cross-checks the native text layer against OCR results.

License

TurboOCR is licensed under the MIT License. See the TurboOCR GitHub repository for details.