Skip to content

Import

Import is the first stage: it turns whatever you feed Aglaïa into raw scans inside the project. A raw scan is the untouched source image plus its DPI; it becomes the root of a page’s processing tree, and everything downstream (pipeline, OCR, export) refers back to it.

What you can import

SourceHow it entersNotes
Image files.jpg / .png / … given to the CLI, or dropped in the GUI import panelone raw scan per file, sorted by filename
PDF pagesa .pdf given to the CLI or the import paneleach page is rendered to an image (via pypdfium2) at render_dpi, default 200
Live webcamthe capture GUI’s shutter (or voice command)each captured frame becomes a raw scan

PDF import renders each page to a raster image — Aglaïa is a page-image pipeline, not a text extractor. A born-digital PDF that already has selectable text is better read directly; import is for scanned or photographed pages.

What import does

Each input is persisted as a scans row (one per page) with a raw root nodes row pointing at the decoded COLOR image. From there the page is enqueued into the processing chain. Images are content-hashed, so re-importing an identical file does not duplicate the blob.

In the GUI, imported pages appear in the scans column immediately and begin processing; in headless mode the positional arguments to aglaia.py are the import set.

Use cases

  • Digitise a phone-photo of a book — import the JPEGs, let the chain dewarp and clean them.
  • Re-process an existing PDF scan — import the PDF; each page is rendered and run through the pipeline.
  • Reopen a project — passing an existing .agl file is not an import; it loads the scans already stored.