OCR engines
Aglaïa runs OCR through a pluggable engine abstraction: every backend
implements the same recognize(image, languages) contract, so you can
switch engines per document without changing anything else. Five ship in
the box, and you can drop in your own.
Background
No single OCR engine wins everywhere. Apple Vision is fast and on-device but weaker on non-Latin scripts; vision-language models (Surya, PaddleOCR-VL) are far more accurate on hard pages but slower and heavier; a cloud service reads anything but sends your page over the wire. Aglaïa exposes all of them behind one interface and lets you choose, even mixing a fast primary with a slow complement that only re-reads the low-confidence lines.
Use cases
- Clean Latin text → Apple Vision (or Apple Document), near-instant.
- Mixed or non-Latin scripts → a VLM engine for accuracy.
- Tricky historical type → Mistral Document AI in the cloud.
- Bulk, offline → keep everything on-device; nothing leaves the Mac.
Comparison of OCR engines
| Engine | Where | Speed | Accuracy | Notes |
|---|---|---|---|---|
| Apple Document | on-device | fast | gold (mixed script) | recovers page (headings, blocks, reading order) — the choice for Markdown |
| Apple Vision | on-device | fast | good | line-based, Latin-first, no page — for the searchable-PDF text layer, not Markdown; default |
| Surya | on-device (llama.cpp) | slow | gold | VLM via bundled llama-server |
| PaddleOCR-VL | on-device | slow | high | VLM alternative |
| Mistral Document AI | cloud | network-bound | gold | reads any script; key in the OS keychain |
A unified OCR DPI knob downsamples the page to a sweet spot (≈150 dpi) before inference, regardless of engine.
Related resources
- Processors — add a drop-in OCR engine plugin
- Markdown export — structured output from OCR
- Configuration — the OCR DPI and confidence-gate keys