In this tutorial, you will learn how to create ground truth data to train an HTR model, using the Callico collaborative annotation platform.
This section should be followed and performed after you have completed the steps for generating ground truth data to train an image segmentation model, available on this page.
Since you learned how to create image segmentation data in the previous sections, you should have:
Text line
elements, annotated on Callico and exported back to Arkindex, available in your dataset,The first step of this tutorial will be to import data to transcribe to Callico. You can log in on Callico's demonstration instance and access the details page of your project from the homepage by clicking on it.
Let's start by importing the data to be transcribed. You can click on the Import from Arkindex action in the Elements section of the menu on the left side of the project details page:
Then, fill in the import form as presented below, to import all of the Text line
elements along with their Page
parent from your dataset containing data from Europeana:
aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee
value by yours which can be copied from your Arkindex dataset details page, just below its name.Page
and Text line
elements from your dataset. Hold the CTRL
key from your keyboard while clicking to select multiple types.Once you have started the data import, you will be redirected to a new page where you can track its progress. Note that this page is not dynamically refreshed. You will need to reload it manually to see updated status and logs. When the import is complete, its status will be updated to Completed
.
While Arkindex elements are being imported into your Callico project, you can start setting up your annotation campaign.
First, navigate back to your Callico's project details page by using the navbar at the top of the page and clicking on your project name.
From there, you can click on the Create action in the Campaigns section of the menu on the left side of the project details page:
Then, fill in the creation form as presented below:
Transcription
one to follow this tutorial.Once your transcription campaign is created, you will be redirected to its configuration page. Fill in the configuration form as presented below:
10
, this will allow annotators to request tasks by batch containing 10 pages.Text line
type and uncheck all others.After configuring your campaign, you will be redirected to its details page. From there, you can access the form to create annotation tasks by clicking the Create action in the Tasks section of the menu on the left:
Please make sure your import process is complete before creating your annotation tasks, otherwise you may miss pages while annotating.
Then, fill in the creation form as presented below:
Pages
.Once the tasks are created, you will be redirected to the task list which should contain many items, one for each page to be annotated from your dataset.
You can navigate back to your Callico's project details page by using the navbar at the top of the page and clicking on your project name.
After completing the first tutorial on creating image segmentation data for a training, you should already know how to invite contributors to your project and/or have a contributor account at your disposal. If you need a quick reminder, you can read the dedicated section on the other tutorial page.
In this section, we will put ourselves in the shoes of a Contributor
user whose role is to annotate tasks from one or more campaigns.
As a contributor of the project, you can request tasks as explained in the dedicated section of the segmentation tutorial. You can either click on the My tasks blue button to select 1 task to annotate or the Request tasks grey button to receive a batch of 10 tasks at once.
Now that you know how to request tasks, you will learn how to annotate transcription tasks. Here is an annotation page:
When working on transcription campaigns, you need to transcribe the elements highlighted in green on the image.
In our case, all displayed elements are Text lines
that we have previously segmented ourselves. While transcribing, a blue visual aid is displayed to map each annotation input to an element from the image.
If you are not completely sure about one of your transcriptions, you can mark your answer as uncertain by clicking on the ! yellow square button displayed next to the input you are working in.
A few other tools are available to ease the annotation process:
Zoom in
or Zoom out
the image being worked on,Open in a new tab
tool to better visualize large images,Rotate left
and Rotate right
tools to pivot your image.Do not forget to validate your task by clicking the Submit green button when you are done annotating.
If you have submitted a task without finishing your annotation or want to correct transcribed lines, you can edit it by going to the Annotated
tab in your task list and clicking the Change annotation green button:
You will be redirected to the task annotation page, pre-filled with the last annotation you made:
In this case, we can correct the transcriptions marked as uncertain, remove the associated markers by clicking the ! red square button, and submit a new version for our task:
The last version of an annotation task is the one that is exported to the provider, the one published back to Arkindex in our tutorial.
If necessary, logout from your Contributor
account and login with your first email address.
Back to your Manager
account, you can track the progress of your transcription campaign from its details page:
Once it is completed, i.e. when all tasks from this tutorial are annotated, you can proceed with the export to Arkindex.
To export your results back to Arkindex, you will need to click on the To Arkindex action in the Export results section of the menu on the left of the campaign details page.
Then, fill in the export form as presented below:
Annotated
value to export your tasks.Once you have started the results export, you will be redirected to a new page where you can track its progress. Note that this page is not dynamically refreshed. You will need to reload it manually to see updated status and logs. When the export is complete, its status will be updated to Completed
.
Once the export process is complete, you should check that the annotations for your transcription tasks have been properly published to Arkindex by browsing your dataset elements:
Congratulations, you have successfully transcribed lines in Callico and exported the annotations back to Arkindex!
Now that the ground truth has been annotated on Callico and collected in Arkindex, you are ready to train a Machine Learning transcription model.