Database exports
Projects can be exported as SQLite databases. This page describes its structure and provides usage examples.
To learn how to export a project as a database, see Export a project as a database.
A Python library is available to help in using these exports within Python code.
Database exports are a more advanced format intended for developers and data engineers. |
To export a project in a simpler format better suited for other users, see Export a project as a document.
Database structure
export_version
Name | Type | Description |
---|---|---|
version |
|
Version number of the database structure. Currently set to |
classification
Name | Type | Description |
---|---|---|
id |
|
UUID of the classification. |
element_id |
|
UUID of the element the classification applies to. |
class_name |
|
Name of the class applied by the classification. |
state |
|
Moderation state of the classification. Either |
moderator |
|
Email of a user that validated or rejected the classification. Null when there has been no moderation. |
confidence |
|
Confidence level, between 0 and 1. |
high_confidence |
|
Whether the worker states this class is the correct class for the element. Either 0 or 1. |
worker_run_id |
|
UUID of a worker run that created this classification. Nullable. |
dataset
Name | Type | Description |
---|---|---|
id |
|
UUID of the dataset. |
name |
|
Name of the dataset. |
description |
|
Description of the dataset. |
state |
|
State of the dataset. Either |
sets |
|
Comma-separated list of allowed set names. |
dataset_element
Name | Type | Description |
---|---|---|
id |
|
UUID of the dataset-element relationship. |
dataset_id |
|
UUID of the dataset the element belongs to. |
element_id |
|
UUID of the element contained in the dataset. |
set_name |
|
Name of the set within the dataset where the element is. |
element
Name | Type | Description |
---|---|---|
id |
|
UUID of the element. |
created |
|
Creation date of the element, as a UNIX timestamp. |
updated |
|
Last update date of the element, as a UNIX timestamp. |
name |
|
Name of the element. |
type |
|
Slug of the element’s type. |
image_id |
|
UUID of the image the element is on. Nullable. |
polygon |
|
Coordinates of the element’s polygon on the image, as a JSON array of arrays of floats. Null if the element is not on an image. |
rotation_angle |
|
Clockwise angle of rotation to apply on the image after cropping and mirroring. |
mirrored |
|
Whether or not to mirror the image along its vertical axis after cropping and before rotating. |
worker_run_id |
|
UUID of a worker run that created this element. Nullable. |
confidence |
|
Confidence level reported by the worker, between 0 and 1. Nullable. |
element_path
Name | Type | Description |
---|---|---|
id |
|
UUID of the element path. |
parent_id |
|
UUID of the parent element. |
child_id |
|
UUID of the child element. |
ordering |
|
Position of the child within the parent. |
entity_type
Name | Type | Description |
---|---|---|
id |
|
UUID of the entity type. |
name |
|
Name of the entity type. |
color |
|
Color used for the entity type in the frontend. |
image
Name | Type | Description |
---|---|---|
id |
|
UUID of the image. |
url |
|
URL leading to the image’s IIIF identifier. |
width |
|
Width of the image in pixels. Can be 0 if it is unknown. |
height |
|
Height of the image in pixels. Can be 0 if it is unknown. |
server_id |
|
Integer ID of the server that hosts this image. |
image_server
Name | Type | Description |
---|---|---|
id |
|
Integer ID of the IIIF image server. |
display_name |
|
Human-readable name of the server. |
url |
|
Base URL for all images in this server. |
max_width |
|
Maximum width of any image that the server will return. Nullable. |
max_height |
|
Maximum height of any image that the server will return. Nullable. |
metadata
Name | Type | Description |
---|---|---|
id |
|
UUID of the metadata. |
element_id |
|
UUID of the element this metadata belongs to. |
name |
|
Name of the metadata. |
type |
|
Type of the metadata: |
value |
|
Value of the metadata. |
worker_run_id |
|
UUID of a worker run that created this metadata. Nullable. |
transcription
Name | Type | Description |
---|---|---|
id |
|
UUID of the transcription. |
element_id |
|
UUID of the element this transcription is attached to. |
text |
|
Text of the transcription. |
confidence |
|
Confidence level reported by the worker, between 0 and 1. Nullable. |
orientation |
|
Text orientation: |
worker_run_id |
|
UUID of a worker run that created this transcription. Nullable. |
transcription_entity
Name | Type | Description |
---|---|---|
id |
|
UUID of the transcription-entity link. |
transcription_id |
|
UUID of the transcription. |
type_id |
|
UUID of the entity type. |
offset |
|
Zero-based index of the first character in the transcription’s text that maps to the entity. |
length |
|
Length of the portion of the transcription’s text that maps to the entity. |
worker_run_id |
|
UUID of a worker run that created this transcription-entity link. Nullable. |
confidence |
|
Confidence level reported by the worker, between 0 and 1. Nullable. |
worker_version
Name | Type | Description |
---|---|---|
id |
|
UUID of the worker version. |
name |
|
Human-readable name of the worker. |
slug |
|
Slug of the worker. |
type |
|
Type of the worker (recognizer, classifier, etc.). |
version |
|
Version number for this worker version, when it does not have a Git commit. Nullable. |
revision |
|
Full URL of a Git commit for this version, when the worker is based on a repository. Nullable. |
repository_url |
|
URL of the Git repository that hosts this worker version. Nullable. |
Either version
, or both revision
and repository_url
must be set. All three fields cannot be set at once.
worker_run
Name | Type | Description |
---|---|---|
id |
|
UUID of the worker run. |
worker_version_id |
|
UUID of the worker version of this worker run. |
model_version_id |
|
UUID of the selected model version. Nullable. |
model_id |
|
UUID of the model of the selected model version. Nullable. |
model_name |
|
Name of the model of the selected model version. Nullable. |
configuration_id |
|
UUID of the selected worker configuration. Nullable. |
configuration |
|
Contents of the selected worker configuration, as JSON. Nullable. |
Sample queries
Below are some sample SQL queries you can run on this database.
Listing child elements recursively
WITH RECURSIVE child_ids (id) AS (
SELECT child_id
FROM element_path
WHERE parent_id = '0ac0a631-dddd-4fbe-9c9c-a5d803b3bc94'
UNION
SELECT child_id
FROM element_path
INNER JOIN child_ids ON (element_path.parent_id = child_ids.id)
)
SELECT *
FROM element
INNER JOIN child_ids USING (id);
To learn more about the recursive query syntax used here, see the official SQLite documentation on Recursive Common Table Expressions.
List all transcriptions on direct children
This assumes that all the transcriptions are on children with a text_line
type.
SELECT transcription.*
FROM transcription
INNER JOIN element ON (element.id = transcription.element_id)
INNER JOIN element_path ON (element_path.child_id = element.id)
WHERE element_path.parent_id = '0ac0a631-dddd-4fbe-9c9c-a5d803b3bc94'
AND element.type = 'text_line';
List transcription entities on an element
SELECT entity_type.name, SUBSTR(transcription.text, transcription_entity.offset + 1, transcription_entity.length)
FROM transcription
INNER JOIN transcription_entity ON transcription_entity.transcription_id = transcription.id
INNER JOIN entity_type ON entity_type.id = transcription_entity.type_id
WHERE transcription.element_id = '0ac0a631-dddd-4fbe-9c9c-a5d803b3bc94';
This will return the entity type and the portion of text within the transcription, for each TranscriptionEntity in any transcription of the given element.
Our Python library provides this text splitting through a simpler helper named TranscriptionEntity.text
:
from pathlib import Path
from arkindex_export import open_database, EntityType, TranscriptionEntity, Transcription
open_database(Path("./database.sqlite"))
query = (
TranscriptionEntity
.select(EntityType.name, TranscriptionEntity.text)
.join(Transcription)
.join(EntityType)
.where(Transcription.element_id = '0ac0a631-dddd-4fbe-9c9c-a5d803b3bc94')
)
for type_name, text in query.tuples():
print(f"{type_name}: {text}")