Skip to content

Creating ground truth for transcription

In this tutorial, you will learn how to create ground truth data to train an HTR model, using the Callico collaborative annotation platform.

This section should be followed and performed after you have completed the steps for generating ground truth data to train an image segmentation model, available on this page.

Import data to transcribe in Callico

Since you learned how to create image segmentation data in the previous sections, you should have:

  • Accounts on both the Arkindex and the Callico instances,
  • An Arkindex project along with a dataset containing data from the Pellet corpus,
  • Text line elements, annotated on Callico and exported back to Arkindex, available in your dataset,
  • A project containing one completed segmentation campaign on Callico.

The first step of this tutorial will be to import data to transcribe to Callico. You can log in on Callico’s demonstration instance and access the details page of your project from the homepage by clicking on it.

Callico's homepage

Import an Arkindex dataset

Let’s start by importing the data to be transcribed. You can click on the Import from Arkindex action in the Elements section of the menu on the left side of the project details page:

Callico's project details page

Then, fill in the import form as presented below, to import all of the Text line elements along with their Page parent from your dataset containing data from Europeana:

Callico's pre-filled Arkindex import form
  • Process name - Name of your process to import elements from Arkindex.
  • Element - Not relevant when importing an Arkindex dataset.
  • Dataset - UUID of your Arkindex dataset, you have to replace the aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee value by yours which can be copied from your Arkindex dataset details page, just below its name.
Find your dataset's UUID on Arkindex
  • Filter sets to import - To filter the sets from your Arkindex dataset that should be imported to Callico, we do not use it in this tutorial.
  • Filter types to import - To filter the Arkindex elements to import by their type, here we want to import Page and Text line elements from your dataset. Hold the CTRL key from your keyboard while clicking to select multiple types.
  • All subsequent fields from this form are to be ignored in this tutorial, you can learn more about them in a dedicated page from Callico’s documentation if you wish to.

Track your import’s progress

Once you have started the data import, you will be redirected to a new page where you can track its progress. Note that this page is not dynamically refreshed. You will need to reload it manually to see updated status and logs. When the import is complete, its status will be updated to Completed.

Track the Arkindex import progress in Callico

Create transcription tasks to annotate

While Arkindex elements are being imported into your Callico project, you can start setting up your annotation campaign.

Create a transcription campaign

First, navigate back to your Callico’s project details page by using the navbar at the top of the page and clicking on your project name.

Navigate back to your project from the process page

From there, you can click on the Create action in the Campaigns section of the menu on the left side of the project details page:

Callico's project details page

Then, fill in the creation form as presented below:

Callico's pre-filled campaign creation form
  • Name - Name of your campaign, you can copy the name from the screenshot above.
  • Mode - Mode of your campaign, you have to pick the Transcription one to follow this tutorial.
  • Description (optional) - Description of your campaign, it supports Markdown.

Configure the newly created campaign

Once your transcription campaign is created, you will be redirected to its configuration page. Fill in the configuration form as presented below:

Callico's pre-filled campaign configuration form
  • Name - Keep it unchanged.
  • Description (optional) - Keep it unchanged.
  • Number of tasks to assign per volunteer - Set this field value to 10, this will allow annotators to request tasks by batch containing 10 pages.
  • Number of allowed assignments for available tasks - Ignore this field.
  • Group the transcription inputs for a lighter display during annotation - Check this option for a prettier display during annotation.
  • Element types to annotate - The element types that will be transcribed in your tasks. Keep the Text line type and uncheck all others.

Create annotation tasks

After configuring your campaign, you will be redirected to its details page. From there, you can access the form to create annotation tasks by clicking the Create action in the Tasks section of the menu on the left:

Callico's campaign details page

Warning

Please make sure your import process is complete before creating your annotation tasks, otherwise you may miss pages while annotating.

Then, fill in the creation form as presented below:

Callico's pre-filled task creation form
  • Element type - The element type to create your tasks on, here we are annotating Pages.
  • Users - Ignore this field. Contributors will be able to request the amount of tasks they want, rather than assigning them directly.
  • Sequential - Keep it unchanged, pages will be annotated following their import order in Callico.
  • Elements to use - Keep it unchanged, we want to annotate all the imported pages.
  • Maximum number of tasks per user - Ignore this field.
  • Create unassigned tasks - This option allows the creation of annotation tasks which will be requested by the annotators as they go, in this tutorial, you must check it.

Once the tasks are created, you will be redirected to the task list which should contain many items, one for each page to be annotated from your dataset.

You can navigate back to your Callico’s project details page by using the navbar at the top of the page and clicking on your project name.

Navigate back to your project from the task list

Invite collaborators

After completing the first tutorial on creating image segmentation data for a training, you should already know how to invite contributors to your project and/or have a contributor account at your disposal. If you need a quick reminder, you can read the dedicated section on the other tutorial page.

Annotate the transcription tasks

In this section, we will put ourselves in the shoes of a Contributor user whose role is to annotate tasks from one or more campaigns.

Request annotation tasks

As a contributor of the project, you can request tasks as explained in the dedicated section of the segmentation tutorial. You can either click on the My tasks blue button to select 1 task to annotate or the Request tasks grey button to receive a batch of 10 tasks at once.

Callico's project details page for contributors

Annotate your tasks

Now that you know how to request tasks, you will learn how to annotate transcription tasks. Here is an annotation page:

Callico's annotation page for a transcription campaign

Transcribe all elements

When working on transcription campaigns, you need to transcribe the elements highlighted in green on the image.

In our case, all displayed elements are Text lines that we have previously segmented ourselves. While transcribing, a blue visual aid is displayed to map each annotation input to an element from the image.

Visual blue aid displayed when annotating a specific line

Mark a transcription as uncertain

If you are not completely sure about one of your transcriptions, you can mark your answer as uncertain by clicking on the ! yellow square button displayed next to the input you are working in.

Mark a transcription as uncertain when unsure about the annotation

Other tools on the image component

A few other tools are available to ease the annotation process:

Image component extra tools
  • A slider to Zoom in or Zoom out the image being worked on,
  • An Open in a new tab tool to better visualize large images,
  • Two Rotate left and Rotate right tools to pivot your image.

Warning

Do not forget to validate your task by clicking the Submit green button when you are done annotating.

Correct an annotated task

If you have submitted a task without finishing your annotation or want to correct transcribed lines, you can edit it by going to the Annotated tab in your task list and clicking the Change annotation green button:

Correct an annotated task

You will be redirected to the task annotation page, pre-filled with the last annotation you made:

Annotation page pre-filled with a previous version

In this case, we can correct the transcriptions marked as uncertain, remove the associated markers by clicking the ! red square button, and submit a new version for our task:

Edit uncertain transcriptions and submit a new version

The last version of an annotation task is the one that is exported to the provider, the one published back to Arkindex in our tutorial.

Track and export annotations back to Arkindex

Info

If necessary, logout from your Contributor account and login with your first email address.

Back to your Manager account, you can track the progress of your transcription campaign from its details page:

Callico's campaign details page showing the ongoing progress

Once it is completed, i.e. when all tasks from this tutorial are annotated, you can proceed with the export to Arkindex.

Export results to Arkindex

To export your results back to Arkindex, you will need to click on the To Arkindex action in the Export results section of the menu on the left of the campaign details page.

Then, fill in the export form as presented below:

Callico's pre-filled Arkindex export form
  • Process name - Name of your process to export annotations to Arkindex.
  • Status of tasks to be exported - Pick the Annotated value to export your tasks.
  • Force the republication of annotations - Ignore this field.
  • Publish each annotation separately - Ignore this field.

Track your export’s progress

Once you have started the results export, you will be redirected to a new page where you can track its progress. Note that this page is not dynamically refreshed. You will need to reload it manually to see updated status and logs. When the export is complete, its status will be updated to Completed.

Track the progress of the export to Arkindex in Callico

Check that your Arkindex export went smoothly

Once the export process is complete, you should check that the annotations for your transcription tasks have been properly published to Arkindex by browsing your dataset elements:

Arkindex dataset details page showing its elements

Congratulations, you have successfully transcribed lines in Callico and exported the annotations back to Arkindex!

Transcriptions on text lines are available in Arkindex

Next step

Now that the ground truth has been annotated on Callico and collected in Arkindex, you are ready to train a Machine Learning transcription model.