OCR engines

Aglaïa runs OCR through a pluggable engine abstraction: every backend implements the same recognize(image, languages) contract, so you can switch engines per document without changing anything else. Five ship in the box, and you can drop in your own.

Background

No single OCR engine wins everywhere. Apple Vision is fast and on-device but weaker on non-Latin scripts; vision-language models (Surya, PaddleOCR-VL) are far more accurate on hard pages but slower and heavier; a cloud service reads anything but sends your page over the wire. Aglaïa exposes all of them behind one interface and lets you choose, even mixing a fast primary with a slow complement that only re-reads the low-confidence lines.

Use cases

Clean Latin text → Apple Vision (or Apple Document), near-instant.
Mixed or non-Latin scripts → a VLM engine for accuracy.
Tricky historical type → Mistral Document AI in the cloud.
Bulk, offline → keep everything on-device; nothing leaves the Mac.

Comparison of OCR engines

Engine	Where	Speed	Accuracy	Notes
Apple Document	on-device	fast	gold (mixed script)	recovers page (headings, blocks, reading order) — the choice for Markdown
Apple Vision	on-device	fast	good	line-based, Latin-first, no page — for the searchable-PDF text layer, not Markdown; default
Surya	on-device (llama.cpp)	slow	gold	VLM via bundled `llama-server`
PaddleOCR-VL	on-device	slow	high	VLM alternative
Mistral Document AI	cloud	network-bound	gold	reads any script; key in the OS keychain

A unified OCR DPI knob downsamples the page to a sweet spot (≈150 dpi) before inference, regardless of engine.

Processors — add a drop-in OCR engine plugin
Markdown export — structured output from OCR
Configuration — the OCR DPI and confidence-gate keys

OCR engines

Background

Use cases

Comparison of OCR engines

Related resources