Exports
Projects can be exported as SQLite databases. This page describes its structure and provides usage examples.
To learn how to export a project, see Export a project.
A Python library is available to help in using these exports within Python code.
Database structure
graph LR
element_path --> element
element --> image
element --> worker_run
transcription --> element
transcription --> worker_run
classification --> element
classification --> worker_run
entity --> worker_run
entity --> entity_type
transcription_entity --> transcription
transcription_entity --> entity
transcription_entity --> worker_run
metadata --> element
metadata --> entity
metadata --> worker_run
worker_run --> worker_version
image --> image_server
dataset_element --> element
dataset_element --> dataset
export_version
Name |
Type |
Description |
version |
INTEGER |
Version number of the database structure. Currently set to 10 . |
classification
Name |
Type |
Description |
id |
VARCHAR(37) |
UUID of the classification. |
element_id |
VARCHAR(37) |
UUID of the element the classification applies to. |
class_name |
VARCHAR(1024) |
Name of the class applied by the classification. |
state |
VARCHAR(16) |
Moderation state of the classification. Either pending , validated or rejected . |
moderator |
VARCHAR(255) |
Email of a user that validated or rejected the classification. Null when there has been no moderation. |
confidence |
REAL |
Confidence level, between 0 and 1. |
high_confidence |
INTEGER |
Whether the worker states this class is the correct class for the element. Either 0 or 1. |
worker_run_id |
VARCHAR(37) |
UUID of a worker run that created this classification. Nullable. |
dataset
Name |
Type |
Description |
id |
VARCHAR(37) |
UUID of the dataset. |
name |
VARCHAR(100) |
Name of the dataset. |
description |
TEXT |
Description of the dataset. |
state |
VARCHAR(50) |
State of the dataset. Either open , building , complete or error . |
sets |
TEXT |
Comma-separated list of allowed set names. |
dataset_element
Name |
Type |
Description |
id |
VARCHAR(37) |
UUID of the dataset-element relationship. |
dataset_id |
VARCHAR(37) |
UUID of the dataset the element belongs to. |
element_id |
VARCHAR(37) |
UUID of the element contained in the dataset. |
set_name |
VARCHAR(50) |
Name of the set within the dataset where the element is. |
element
Name |
Type |
Description |
id |
VARCHAR(37) |
UUID of the element. |
created |
REAL |
Creation date of the element, as a UNIX timestamp. |
updated |
REAL |
Last update date of the element, as a UNIX timestamp. |
name |
VARCHAR(250) |
Name of the element. |
type |
VARCHAR(50) |
Slug of the element’s type. |
image_id |
VARCHAR(37) |
UUID of the image the element is on. Nullable. |
polygon |
TEXT |
Coordinates of the element’s polygon on the image, as a JSON array of arrays of floats. Null if the element is not on an image. |
rotation_angle |
INTEGER |
Clockwise angle of rotation to apply on the image after cropping and mirroring. |
mirrored |
INTEGER |
Whether or not to mirror the image along its vertical axis after cropping and before rotating. |
worker_run_id |
VARCHAR(37) |
UUID of a worker run that created this element. Nullable. |
confidence |
REAL |
Confidence level reported by the worker, between 0 and 1. Nullable. |
element_path
Name |
Type |
Description |
id |
VARCHAR(37) |
UUID of the element path. |
parent_id |
VARCHAR(37) |
UUID of the parent element. |
child_id |
VARCHAR(37) |
UUID of the child element. |
ordering |
INTEGER |
Position of the child within the parent. |
entity_type
Name |
Type |
Description |
id |
VARCHAR(37) |
UUID of the entity type. |
name |
TEXT |
Name of the entity type. |
color |
VARCHAR(6) |
Color used for the entity type in the frontend. |
entity
Name |
Type |
Description |
id |
VARCHAR(37) |
UUID of the entity. |
name |
TEXT |
Name of the entity. |
type_id |
VARCHAR(37) |
UUID of the entity type of the entity. |
validated |
INTEGER |
Validation state of the entity (0 or 1). |
moderator |
VARCHAR(255) |
Email of a user that validated the entity. Null if the entity was not validated. |
metas |
TEXT |
Arbitrary metadata about the entity, as a JSON object with string values. Nullable. |
worker_run_id |
VARCHAR(37) |
UUID of a worker run that created this entity. Nullable. |
image
Name |
Type |
Description |
id |
VARCHAR(37) |
UUID of the image. |
url |
TEXT |
URL leading to the image’s IIIF identifier. |
width |
INTEGER |
Width of the image in pixels. Can be 0 if it is unknown. |
height |
INTEGER |
Height of the image in pixels. Can be 0 if it is unknown. |
server_id |
INTEGER |
Integer ID of the server that hosts this image. |
image_server
Name |
Type |
Description |
id |
INTEGER |
Integer ID of the IIIF image server. |
display_name |
VARCHAR(250) |
Human-readable name of the server. |
url |
TEXT |
Base URL for all images in this server. |
max_width |
INTEGER |
Maximum width of any image that the server will return. Nullable. |
max_height |
INTEGER |
Maximum height of any image that the server will return. Nullable. |
Name |
Type |
Description |
id |
VARCHAR(37) |
UUID of the metadata. |
element_id |
VARCHAR(37) |
UUID of the element this metadata belongs to. |
name |
VARCHAR(250) |
Name of the metadata. |
type |
VARCHAR(50) |
Type of the metadata: text , date , reference , etc. |
value |
TEXT |
Value of the metadata. |
entity_id |
VARCHAR(37) |
UUID of an entity linked to this metadata. Nullable. |
worker_run_id |
VARCHAR(37) |
UUID of a worker run that created this metadata. Nullable. |
transcription
Name |
Type |
Description |
id |
VARCHAR(37) |
UUID of the transcription. |
element_id |
VARCHAR(37) |
UUID of the element this transcription is attached to. |
text |
TEXT |
Text of the transcription. |
confidence |
REAL |
Confidence level reported by the worker, between 0 and 1. Nullable. |
orientation |
TEXT |
Text orientation: horizontal-lr , horizontal-rl , vertical-lr or vertical-rl . |
worker_run_id |
VARCHAR(37) |
UUID of a worker run that created this transcription. Nullable. |
transcription_entity
Name |
Type |
Description |
id |
VARCHAR(37) |
UUID of the transcription-entity link. |
transcription_id |
VARCHAR(37) |
UUID of the transcription. |
entity_id |
VARCHAR(37) |
UUID of the entity. |
offset |
INTEGER |
Zero-based index of the first character in the transcription’s text that maps to the entity. |
length |
INTEGER |
Length of the portion of the transcription’s text that maps to the entity. |
worker_run_id |
VARCHAR(37) |
UUID of a worker run that created this transcription-entity link. Nullable. |
confidence |
REAL |
Confidence level reported by the worker, between 0 and 1. Nullable. |
worker_version
Name |
Type |
Description |
id |
VARCHAR(37) |
UUID of the worker version. |
name |
VARCHAR(100) |
Human-readable name of the worker. |
slug |
VARCHAR(100) |
Slug of the worker. |
type |
VARCHAR(50) |
Type of the worker (recognizer, classifier, etc.). |
version |
INTEGER |
Version number for this worker version, when it does not have a Git commit. Nullable. |
revision |
VARCHAR(200) |
Full URL of a Git commit for this version, when the worker is based on a repository. Nullable. |
repository_url |
TEXT |
URL of the Git repository that hosts this worker version. Nullable. |
Either version
, or both revision
and repository_url
must be set. All three fields cannot be set at once.
worker_run
Name |
Type |
Description |
id |
VARCHAR(37) |
UUID of the worker run. |
worker_version_id |
VARCHAR(37) |
UUID of the worker version of this worker run. |
model_version_id |
VARCHAR(37) |
UUID of the selected model version. Nullable. |
model_id |
VARCHAR(37) |
UUID of the model of the selected model version. Nullable. |
model_name |
VARCHAR(100) |
Name of the model of the selected model version. Nullable. |
configuration_id |
VARCHAR(37) |
UUID of the selected worker configuration. Nullable. |
configuration |
TEXT |
Contents of the selected worker configuration, as JSON. Nullable. |
Sample queries
Below are some sample SQL queries you can run on this database.
Listing elements by type
SELECT *
FROM element
WHERE type = 'page';
Listing child elements recursively
WITH RECURSIVE child_ids (id) AS (
SELECT child_id
FROM element_path
WHERE parent_id = '0ac0a631-dddd-4fbe-9c9c-a5d803b3bc94'
UNION
SELECT child_id
FROM element_path
INNER JOIN child_ids ON (element_path.parent_id = child_ids.id)
)
SELECT *
FROM element
INNER JOIN child_ids USING (id);
To learn more about the recursive query syntax used here, see the official SQLite documentation on Recursive Common Table Expressions.
List all transcriptions on direct children
This assumes that all the transcriptions are on children with a text_line
type.
SELECT transcription.*
FROM transcription
INNER JOIN element ON (element.id = transcription.element_id)
INNER JOIN element_path ON (element_path.child_id = element.id)
WHERE element_path.parent_id = '0ac0a631-dddd-4fbe-9c9c-a5d803b3bc94'
AND element.type = 'text_line';