Exports

    Projects can be exported as SQLite databases. This page describes its structure and provides usage examples.

    To learn how to export a project, see Export a project.

    A Python library is available to help in using these exports within Python code.

    Database structure🔗

    Relationships within the database

    Relationships within the database

    export_version🔗

    NameTypeDescription
    versionINTEGERVersion number of the database structure. Currently set to 8.

    classification🔗

    NameTypeDescription
    idVARCHAR(37)UUID of the classification.
    element_idVARCHAR(37)UUID of the element the classification applies to.
    class_nameVARCHAR(1024)Name of the class applied by the classification.
    stateVARCHAR(16)Moderation state of the classification. Either pending, validated or rejected.
    moderatorVARCHAR(255)Email of a user that validated or rejected the classification. Null when there has been no moderation.
    confidenceREALConfidence level, between 0 and 1. Nullable.
    high_confidenceINTEGERWhether the worker states this class is the correct class for the element. Either 0 or 1.
    worker_version_idVARCHAR(37)UUID of a worker version that created this classification. Nullable.
    worker_run_idVARCHAR(37)UUID of a worker run that created this classification. Nullable.

    dataset🔗

    NameTypeDescription
    idVARCHAR(37)UUID of the dataset.
    nameVARCHAR(100)Name of the dataset.
    stateVARCHAR(50)State of the dataset. Either open, building, complete or error.
    setsTEXTComma-separated list of allowed set names.

    dataset_element🔗

    NameTypeDescription
    idVARCHAR(37)UUID of the dataset-element relationship.
    dataset_idVARCHAR(37)UUID of the dataset the element belongs to.
    element_idVARCHAR(37)UUID of the element contained in the dataset.
    set_nameVARCHAR(50)Name of the set within the dataset where the element is.

    element🔗

    NameTypeDescription
    idVARCHAR(37)UUID of the element.
    createdREALCreation date of the element, as a UNIX timestamp.
    updatedREALLast update date of the element, as a UNIX timestamp.
    nameVARCHAR(250)Name of the element.
    typeVARCHAR(50)Slug of the element's type.
    image_idVARCHAR(37)UUID of the image the element is on. Nullable.
    polygonTEXTCoordinates of the element's polygon on the image, as a JSON array of arrays of floats. Null if the element is not on an image.
    rotation_angleINTEGERClockwise angle of rotation to apply on the image after cropping and mirroring.
    mirroredINTEGERWhether or not to mirror the image along its vertical axis after cropping and before rotating.
    worker_version_idVARCHAR(37)UUID of a worker version that created this element. Nullable.
    worker_run_idVARCHAR(37)UUID of a worker run that created this element. Nullable.
    confidenceREALConfidence level reported by the worker, between 0 and 1. Nullable.

    element_path🔗

    NameTypeDescription
    idVARCHAR(37)UUID of the element path.
    parent_idVARCHAR(37)UUID of the parent element.
    child_idVARCHAR(37)UUID of the child element.
    orderingINTEGERPosition of the child within the parent.

    entity_type🔗

    NameTypeDescription
    idVARCHAR(37)UUID of the entity type.
    nameTEXTName of the entity type.
    colorVARCHAR(6)Color used for the entity type in the frontend.

    entity🔗

    NameTypeDescription
    idVARCHAR(37)UUID of the entity.
    nameTEXTName of the entity.
    type_idVARCHAR(37)UUID of the entity type of the entity.
    validatedINTEGERValidation state of the entity (0 or 1).
    moderatorVARCHAR(255)Email of a user that validated the entity. Null if the entity was not validated.
    metasTEXTArbitrary metadata about the entity, as a JSON object with string values. Nullable.
    worker_version_idVARCHAR(37)UUID of a worker version that created this entity. Nullable.
    worker_run_idVARCHAR(37)UUID of a worker run that created this entity. Nullable.
    NameTypeDescription
    idVARCHAR(37)UUID of the entity link.
    parent_idVARCHAR(37)UUID of the parent entity.
    child_idVARCHAR(37)UUID of the child entity.
    role_idVARCHAR(37)UUID of the entity role used to link both entities.

    entity_role🔗

    NameTypeDescription
    idVARCHAR(37)UUID of the entity role.
    parent_nameVARCHAR(250)Name of the parent entity in a link, such as "employer".
    child_nameVARCHAR(250)Name of the child entity in a link, such as "employee".
    parent_type_idVARCHAR(37)UUID of the entity type of the parent entities that can have this role in a link.
    child_type_idVARCHAR(37)UUID of the entity type of the child entities that can have this role in a link.

    image🔗

    NameTypeDescription
    idVARCHAR(37)UUID of the image.
    urlTEXTURL leading to the image's IIIF identifier.
    widthINTEGERWidth of the image in pixels. Can be 0 if it is unknown.
    heightINTEGERHeight of the image in pixels. Can be 0 if it is unknown.
    server_idINTEGERInteger ID of the server that hosts this image.

    image_server🔗

    NameTypeDescription
    idINTEGERInteger ID of the IIIF image server.
    display_nameVARCHAR(250)Human-readable name of the server.
    urlTEXTBase URL for all images in this server.
    max_widthINTEGERMaximum width of any image that the server will return. Nullable.
    max_heightINTEGERMaximum height of any image that the server will return. Nullable.

    metadata🔗

    NameTypeDescription
    idVARCHAR(37)UUID of the metadata.
    element_idVARCHAR(37)UUID of the element this metadata belongs to.
    nameVARCHAR(250)Name of the metadata.
    typeVARCHAR(50)Type of the metadata: text, date, reference, etc.
    valueTEXTValue of the metadata.
    entity_idVARCHAR(37)UUID of an entity linked to this metadata. Nullable.
    worker_version_idVARCHAR(37)UUID of a worker version that created this metadata. Nullable.
    worker_run_idVARCHAR(37)UUID of a worker run that created this metadata. Nullable.

    transcription🔗

    NameTypeDescription
    idVARCHAR(37)UUID of the transcription.
    element_idVARCHAR(37)UUID of the element this transcription is attached to.
    textTEXTText of the transcription.
    confidenceREALConfidence level reported by the worker, between 0 and 1. Nullable.
    orientationTEXTText orientation: horizontal-lr, horizontal-rl, vertical-lr or vertical-rl.
    worker_version_idVARCHAR(37)UUID of a worker version that created this transcription. Nullable.
    worker_run_idVARCHAR(37)UUID of a worker run that created this transcription. Nullable.

    transcription_entity🔗

    NameTypeDescription
    idINTEGERUUID of the transcription-entity link.
    transcription_idVARCHAR(37)UUID of the transcription.
    entity_idVARCHAR(37)UUID of the entity.
    offsetINTEGERZero-based index of the first character in the transcription's text that maps to the entity.
    lengthINTEGERLength of the portion of the transcription's text that maps to the entity.
    worker_version_idVARCHAR(37)UUID of the worker version that created this transcription-entity link. Nullable.
    worker_run_idVARCHAR(37)UUID of a worker run that created this transcription-entity link. Nullable.
    confidenceREALConfidence level reported by the worker, between 0 and 1. Nullable.

    worker_version🔗

    NameTypeDescription
    idVARCHAR(37)UUID of the worker version.
    nameVARCHAR(100)Human-readable name of the worker.
    slugVARCHAR(100)Slug of the worker.
    typeVARCHAR(50)Type of the worker (recognizer, classifier, etc.).
    versionINTEGERVersion number for this worker version, when it does not have a Git commit. Nullable.
    revisionVARCHAR(50)Hash of the Git commit that created this worker version. Nullable.
    repository_urlTEXTURL of the Git repository that hosts this worker version. Nullable.

    Either version, or both revision and repository_url must be set. All three fields cannot be set at once.

    worker_run🔗

    NameTypeDescription
    idVARCHAR(37)UUID of the worker run.
    worker_version_idVARCHAR(37)UUID of the worker version of this worker run.
    model_version_idVARCHAR(37)UUID of the selected model version. Nullable.
    model_idVARCHAR(37)UUID of the model of the selected model version. Nullable.
    model_nameVARCHAR(100)Name of the model of the selected model version. Nullable.
    configuration_idVARCHAR(37)UUID of the selected worker configuration. Nullable.
    configurationTEXTContents of the selected worker configuration, as JSON. Nullable.

    Sample queries🔗

    Below are some sample SQL queries you can run on this database.

    Listing elements by type🔗

    SELECT *
    FROM element
    WHERE type = 'page';
    

    Listing child elements recursively🔗

    WITH RECURSIVE child_ids (id) AS (
            SELECT child_id
            FROM element_path
            WHERE parent_id = '0ac0a631-dddd-4fbe-9c9c-a5d803b3bc94'
        UNION
            SELECT child_id
            FROM element_path
            INNER JOIN child_ids ON (element_path.parent_id = child_ids.id)
    )
    SELECT *
    FROM element
    INNER JOIN child_ids USING (id);
    

    To learn more about the recursive query syntax used here, see the official SQLite documentation on Recursive Common Table Expressions.

    List all transcriptions on direct children🔗

    This assumes that all the transcriptions are on children with a text_line type.

    SELECT transcription.*
    FROM transcription
    INNER JOIN element ON (element.id = transcription.element_id)
    INNER JOIN element_path ON (element_path.child_id = element.id)
    WHERE element_path.parent_id = '0ac0a631-dddd-4fbe-9c9c-a5d803b3bc94'
    AND element.type = 'text_line';