Skip to content

Getting started with Arkindex

Arkindex is designed as a modular and scalable platform to help you organise, process and extract information from large collections of digitised documents and objects. Here are the key steps and features you’ll use to build your workflow in Arkindex.

1. Understand the Data Model

Start by learning how Arkindex structures your data. Its multi-level data modelling system allows you to represent your collections, documents and their internal structure - from entire archives to individual layout zones.

→ Learn more: Data modeling

2. Import Your Documents

Once your data model is in place, you can import content from a variety of sources: IIIF manifests, PDF files, standard image formats and more. Arkindex also supports large batch imports and flexible metadata handling.

→ Learn more: Importing images

3. Run Machine Learning Workflows

With your content in Arkindex, you can apply machine learning workflows to analyze and enrich it:

  • Textual documents: layout analysis, transcription, structuration, named entity recognition, and information extraction
  • Photographs and objects: object segmentation, visual description, and metadata extraction

These processes run on a scalable architecture, which can be deployed on a single machine, across cloud environments, or on HPC clusters using SLURM.

→ Learn more: Machine Learning workers · Processing architecture

4. Export the Results

Once processed and optionally annotated, your enriched data can be exported in a variety of formats, including PDF, DOCX, CSV, METS/ALTO, PAGE XML, and SQLite.

→ Learn more: Exporting data


To Go Further

  • User and rights management: Define fine-grained access controls for users and groups, including permissions to view, edit, or process content (Enterprise only)User and rights management

  • Annotation tools: Annotate within Arkindex, or use Callico for managing large-scale collaborative annotation projects.

  • Advanced usage: Automate and extend Arkindex using its REST API, command-line tools, and Python client. → API documentation