TurboOCR
This service deploys TurboOCR, a GPU-accelerated OCR server built on C++ / CUDA / TensorRT / PP-OCRv5. It exposes both an HTTP API and a gRPC API from a single binary that share the same GPU pipeline pool, with Prometheus metrics built in.
Services
turboocr: TurboOCR HTTP (port 8000) + gRPC (port 50051) inference server
Requirements
- Linux host with NVIDIA driver 595 or newer
- Turing or newer GPU (RTX 20-series / GTX 16-series and up)
- NVIDIA Container Toolkit installed and configured for Docker
Environment Variables
| Variable Name | Description | Default Value |
|---|---|---|
TURBOOCR_VERSION |
TurboOCR image version | v2.1.1 |
TURBOOCR_LANG |
Language bundle: latin, chinese, greek, eslav, arabic, korean, thai |
"" (latin) |
TURBOOCR_SERVER |
With chinese, set to 1 for the 84 MB server rec |
"" |
TURBOOCR_PIPELINE_POOL_SIZE |
Concurrent GPU pipelines (~1.4 GB VRAM each); empty = auto | "" |
TURBOOCR_DISABLE_LAYOUT |
Disable layout detection model (saves ~300-500 MB VRAM) | 0 |
TURBOOCR_PDF_MODE |
Default PDF mode: ocr / geometric / auto / auto_verified |
ocr |
TURBOOCR_DISABLE_ANGLE_CLS |
Skip angle classifier (~0.4 ms savings) | 0 |
TURBOOCR_DET_MAX_SIDE |
Max detection input size in pixels | 960 |
TURBOOCR_PDF_DAEMONS |
PDF render daemons | 16 |
TURBOOCR_PDF_WORKERS |
PDF worker threads | 4 |
TURBOOCR_MAX_PDF_PAGES |
Maximum pages per PDF request | 2000 |
TURBOOCR_LOG_LEVEL |
Log level: debug / info / warn / error |
info |
TURBOOCR_LOG_FORMAT |
Log format: json / text |
json |
TURBOOCR_HTTP_PORT_OVERRIDE |
Host port for HTTP API | 8000 |
TURBOOCR_GRPC_PORT_OVERRIDE |
Host port for gRPC API | 50051 |
TURBOOCR_CPU_LIMIT |
CPU limit | 8.0 |
TURBOOCR_MEMORY_LIMIT |
Memory limit | 12G |
TURBOOCR_GPU_COUNT |
Number of NVIDIA GPUs to reserve | 1 |
TURBOOCR_SHM_SIZE |
Shared memory size | 2g |
Copy .env.example to .env and override only the variables you need to change.
Volumes
turboocr_trt_cache: Caches TensorRT engines built from ONNX on first start. Must be a named volume — a bind-mount of an empty host directory would shadow the baked-in language bundles and the server would fail to load models.
Usage
Start TurboOCR
docker compose up -d
The first start builds 4 TensorRT engines from ONNX. Measured build times on an RTX 3070 Laptop: det (~5 min) + rec (~30 min) + cls (~4 min) + layout (~28 min) = ~67–90 minutes total. High-end desktop GPUs finish in ~15 minutes. The container reports unhealthy while compilation is in progress — this is expected. Once all engines are built the server starts and the container transitions to healthy. Subsequent restarts reuse the cached engines and start in seconds.
Tip — faster first boot: Set
TURBOOCR_DISABLE_LAYOUT=1to skip the layout detection engine (~28 min on laptop GPUs). Only do this if you don't need the?layout=1PDF endpoint.
Endpoints
- HTTP API: http://localhost:8000
- gRPC API:
localhost:50051 - Health: http://localhost:8000/health
- Readiness: http://localhost:8000/health/ready
- Metrics (Prometheus): http://localhost:8000/metrics
Test the API
# Image — raw bytes (fastest path)
curl -X POST http://localhost:8000/ocr/raw \
--data-binary @document.png \
-H "Content-Type: image/png"
# Image — base64 JSON
curl -X POST http://localhost:8000/ocr \
-H "Content-Type: application/json" \
-d '{"image":"'$(base64 -w0 document.png)'"}'
# PDF — raw bytes
curl -X POST http://localhost:8000/ocr/pdf \
--data-binary @document.pdf
# PDF with layout detection enabled
curl -X POST "http://localhost:8000/ocr/pdf?layout=1&mode=auto" \
--data-binary @document.pdf
Important: Use HTTP keep-alive. Sending many short-lived connections (e.g. one
curlper request in a loop) can overwhelm the server. Standard HTTP client libraries (requests.Session,aiohttp, Gohttp.Client, etc.) reuse connections by default.
Switching Languages
Edit .env and restart:
TURBOOCR_LANG=chinese
TURBOOCR_SERVER=1 # optional: use the 84 MB Chinese server rec
docker compose up -d
All language bundles are baked into the image at build time (SHA256-verified from the pinned PP-OCRv5 release). No runtime downloads.
Performance Tuning
- GPU pipelines — set
TURBOOCR_PIPELINE_POOL_SIZEbased on available VRAM (~1.4 GB each) - Layout overhead —
?layout=1reduces throughput by ~20%; setTURBOOCR_DISABLE_LAYOUT=1to skip loading the model entirely - Shared memory — increase
TURBOOCR_SHM_SIZEif you process very large PDFs
Security Notes
- The API has no authentication by default. Put a reverse proxy (nginx, Caddy) in front for production.
- The default PDF mode is
ocr, which only trusts pixel data and is safe for untrusted PDF uploads. - Do not set
TURBOOCR_PDF_MODEtogeometricorautoglobally if you accept PDFs from untrusted sources — a malicious PDF can embed invisible text or remap glyphs to inject arbitrary strings into the text layer. - Use
auto_verifiedfor higher accuracy on trusted documents; it cross-checks the native text layer against OCR results.
License
TurboOCR is licensed under the MIT License. See the TurboOCR GitHub repository for details.