Transcription
Arkindex stores the recognized text from an image as a Transcription
. Each Transcription
is directly linked to one Element
.
Rules¶
- An
Element
can have multipleTranscription
:- Different tools may provide different results
- Different human annotators can provide different results
- A Machine Learning worker may provide a
confidence
score (between 0.0 and 1.0, or percentage) - There is no size limit to the transcription content
- There is no formatting possible for the transcription content, it’s raw text
Web interface¶
Through Arkindex web interface, you will be able as a Project contributor to:
- View all existing transcriptions.
- Create new elements, of any existing type on the project (usually lines, words or paragraphs, …).
- Then add a transcription to these elements.
You can read all existing transcriptions of an element by selecting an element on the Children tree on the left side of the screen: all transcriptions will appear on the right side in the Transcriptions panel.
Text orientation¶
When creating a transcription on an element, you can choose its text orientation. This text orientation can be set either by a human manually creating a transcription, or by a Machine Learning worker, for example depending on the language of the recognized text.
The text orientation is used for the transcription’s display, as well as by Machine Learning worker processes.
Using the API (see the API documentation), a transcription’s text orientation can be set to:
- horizontal left-to-right: for text where each line should be read from left to right, and text lines are organized from top to bottom. This is the orientation for most Western languages, and the default text orientation if none is specified when creating a transcription. In the frontend, the text is aligned to the left.
- horizontal right-to-left: for text where each line should be read from right to left, and text lines are organized from top to bottom. For example, this can be used for Arabic. In the frontend, the text is aligned to the right.
- vertical left-to-right: for text where each line should be read from top to bottom, and text lines are organized from left to right.
- vertical right-to-left: for text where each line should be read from top to bottom, and text lines are organized from right to left. For example, this can be used for Japanese.
From the frontend, the only available text orientations are horizontal right-to-left and horizontal left-to-right. Displaying vertically oriented text is not supported, so transcriptions that have a vertical text orientation are displayed horizontally.
API Endpoints¶
These endpoints are the most useful to handle Element transcriptions: