Skip to content

OCR engines

Aglaïa runs OCR through a pluggable engine abstraction: every backend implements the same recognize(image, languages) contract, so you can switch engines per document without changing anything else. Five ship in the box, and you can drop in your own.

Background

No single OCR engine wins everywhere. Apple Vision is fast and on-device but weaker on non-Latin scripts; vision-language models (Surya, PaddleOCR-VL) are far more accurate on hard pages but slower and heavier; a cloud service reads anything but sends your page over the wire. Aglaïa exposes all of them behind one interface and lets you choose, even mixing a fast primary with a slow complement that only re-reads the low-confidence lines.

Use cases

  • Clean Latin text → Apple Vision (or Apple Document), near-instant.
  • Mixed or non-Latin scripts → a VLM engine for accuracy.
  • Tricky historical type → Mistral Document AI in the cloud.
  • Bulk, offline → keep everything on-device; nothing leaves the Mac.

Comparison of OCR engines

EngineWhereSpeedAccuracyNotes
Apple Documenton-devicefastgold (mixed script)recovers page (headings, blocks, reading order) — the choice for Markdown
Apple Visionon-devicefastgoodline-based, Latin-first, no page — for the searchable-PDF text layer, not Markdown; default
Suryaon-device (llama.cpp)slowgoldVLM via bundled llama-server
PaddleOCR-VLon-deviceslowhighVLM alternative
Mistral Document AIcloudnetwork-boundgoldreads any script; key in the OS keychain

A unified OCR DPI knob downsamples the page to a sweet spot (≈150 dpi) before inference, regardless of engine.