Skip to content

Database exports

Projects can be exported as SQLite databases. This page describes its structure and provides usage examples.

To learn how to export a project as a database, see Export a project as a database.

A Python library is available to help in using these exports within Python code.

Info

Database exports are a more advanced format intended for developers and data engineers.

To export a project in a simpler format better suited for other users, see Export a project as a document.

Database structure

graph LR
    element_path --> element
    element --> image
    element --> worker_run
    transcription --> element
    transcription --> worker_run
    classification --> element
    classification --> worker_run
    entity --> worker_run
    entity --> entity_type
    transcription_entity --> transcription
    transcription_entity --> entity
    transcription_entity --> worker_run
    metadata --> element
    metadata --> entity
    metadata --> worker_run
    worker_run --> worker_version
    image --> image_server
    dataset_element --> element
    dataset_element --> dataset

export_version

Name Type Description
version INTEGER Version number of the database structure. Currently set to 10.

classification

Name Type Description
id VARCHAR(37) UUID of the classification.
element_id VARCHAR(37) UUID of the element the classification applies to.
class_name VARCHAR(1024) Name of the class applied by the classification.
state VARCHAR(16) Moderation state of the classification. Either pending, validated or rejected.
moderator VARCHAR(255) Email of a user that validated or rejected the classification. Null when there has been no moderation.
confidence REAL Confidence level, between 0 and 1.
high_confidence INTEGER Whether the worker states this class is the correct class for the element. Either 0 or 1.
worker_run_id VARCHAR(37) UUID of a worker run that created this classification. Nullable.

dataset

Name Type Description
id VARCHAR(37) UUID of the dataset.
name VARCHAR(100) Name of the dataset.
description TEXT Description of the dataset.
state VARCHAR(50) State of the dataset. Either open, building, complete or error.
sets TEXT Comma-separated list of allowed set names.

dataset_element

Name Type Description
id VARCHAR(37) UUID of the dataset-element relationship.
dataset_id VARCHAR(37) UUID of the dataset the element belongs to.
element_id VARCHAR(37) UUID of the element contained in the dataset.
set_name VARCHAR(50) Name of the set within the dataset where the element is.

element

Name Type Description
id VARCHAR(37) UUID of the element.
created REAL Creation date of the element, as a UNIX timestamp.
updated REAL Last update date of the element, as a UNIX timestamp.
name VARCHAR(250) Name of the element.
type VARCHAR(50) Slug of the element’s type.
image_id VARCHAR(37) UUID of the image the element is on. Nullable.
polygon TEXT Coordinates of the element’s polygon on the image, as a JSON array of arrays of floats. Null if the element is not on an image.
rotation_angle INTEGER Clockwise angle of rotation to apply on the image after cropping and mirroring.
mirrored INTEGER Whether or not to mirror the image along its vertical axis after cropping and before rotating.
worker_run_id VARCHAR(37) UUID of a worker run that created this element. Nullable.
confidence REAL Confidence level reported by the worker, between 0 and 1. Nullable.

element_path

Name Type Description
id VARCHAR(37) UUID of the element path.
parent_id VARCHAR(37) UUID of the parent element.
child_id VARCHAR(37) UUID of the child element.
ordering INTEGER Position of the child within the parent.

entity_type

Name Type Description
id VARCHAR(37) UUID of the entity type.
name TEXT Name of the entity type.
color VARCHAR(6) Color used for the entity type in the frontend.

entity

Name Type Description
id VARCHAR(37) UUID of the entity.
name TEXT Name of the entity.
type_id VARCHAR(37) UUID of the entity type of the entity.
validated INTEGER Validation state of the entity (0 or 1).
moderator VARCHAR(255) Email of a user that validated the entity. Null if the entity was not validated.
metas TEXT Arbitrary metadata about the entity, as a JSON object with string values. Nullable.
worker_run_id VARCHAR(37) UUID of a worker run that created this entity. Nullable.

image

Name Type Description
id VARCHAR(37) UUID of the image.
url TEXT URL leading to the image’s IIIF identifier.
width INTEGER Width of the image in pixels. Can be 0 if it is unknown.
height INTEGER Height of the image in pixels. Can be 0 if it is unknown.
server_id INTEGER Integer ID of the server that hosts this image.

image_server

Name Type Description
id INTEGER Integer ID of the IIIF image server.
display_name VARCHAR(250) Human-readable name of the server.
url TEXT Base URL for all images in this server.
max_width INTEGER Maximum width of any image that the server will return. Nullable.
max_height INTEGER Maximum height of any image that the server will return. Nullable.

metadata

Name Type Description
id VARCHAR(37) UUID of the metadata.
element_id VARCHAR(37) UUID of the element this metadata belongs to.
name VARCHAR(250) Name of the metadata.
type VARCHAR(50) Type of the metadata: text, date, reference, etc.
value TEXT Value of the metadata.
entity_id VARCHAR(37) UUID of an entity linked to this metadata. Nullable.
worker_run_id VARCHAR(37) UUID of a worker run that created this metadata. Nullable.

transcription

Name Type Description
id VARCHAR(37) UUID of the transcription.
element_id VARCHAR(37) UUID of the element this transcription is attached to.
text TEXT Text of the transcription.
confidence REAL Confidence level reported by the worker, between 0 and 1. Nullable.
orientation TEXT Text orientation: horizontal-lr, horizontal-rl, vertical-lr or vertical-rl.
worker_run_id VARCHAR(37) UUID of a worker run that created this transcription. Nullable.

transcription_entity

Name Type Description
id VARCHAR(37) UUID of the transcription-entity link.
transcription_id VARCHAR(37) UUID of the transcription.
entity_id VARCHAR(37) UUID of the entity.
offset INTEGER Zero-based index of the first character in the transcription’s text that maps to the entity.
length INTEGER Length of the portion of the transcription’s text that maps to the entity.
worker_run_id VARCHAR(37) UUID of a worker run that created this transcription-entity link. Nullable.
confidence REAL Confidence level reported by the worker, between 0 and 1. Nullable.

worker_version

Name Type Description
id VARCHAR(37) UUID of the worker version.
name VARCHAR(100) Human-readable name of the worker.
slug VARCHAR(100) Slug of the worker.
type VARCHAR(50) Type of the worker (recognizer, classifier, etc.).
version INTEGER Version number for this worker version, when it does not have a Git commit. Nullable.
revision VARCHAR(200) Full URL of a Git commit for this version, when the worker is based on a repository. Nullable.
repository_url TEXT URL of the Git repository that hosts this worker version. Nullable.

Either version, or both revision and repository_url must be set. All three fields cannot be set at once.

worker_run

Name Type Description
id VARCHAR(37) UUID of the worker run.
worker_version_id VARCHAR(37) UUID of the worker version of this worker run.
model_version_id VARCHAR(37) UUID of the selected model version. Nullable.
model_id VARCHAR(37) UUID of the model of the selected model version. Nullable.
model_name VARCHAR(100) Name of the model of the selected model version. Nullable.
configuration_id VARCHAR(37) UUID of the selected worker configuration. Nullable.
configuration TEXT Contents of the selected worker configuration, as JSON. Nullable.

Sample queries

Below are some sample SQL queries you can run on this database.

Listing elements by type

SELECT *
FROM element
WHERE type = 'page';

Listing child elements recursively

WITH RECURSIVE child_ids (id) AS (
        SELECT child_id
        FROM element_path
        WHERE parent_id = '0ac0a631-dddd-4fbe-9c9c-a5d803b3bc94'
    UNION
        SELECT child_id
        FROM element_path
        INNER JOIN child_ids ON (element_path.parent_id = child_ids.id)
)
SELECT *
FROM element
INNER JOIN child_ids USING (id);

To learn more about the recursive query syntax used here, see the official SQLite documentation on Recursive Common Table Expressions.

List all transcriptions on direct children

This assumes that all the transcriptions are on children with a text_line type.

SELECT transcription.*
FROM transcription
INNER JOIN element ON (element.id = transcription.element_id)
INNER JOIN element_path ON (element_path.child_id = element.id)
WHERE element_path.parent_id = '0ac0a631-dddd-4fbe-9c9c-a5d803b3bc94'
AND element.type = 'text_line';