Skip to content

Running your models in production

In this tutorial, you will learn how to run the models you trained using Arkindex.

This section is to be followed and carried out after annotating the Pellet dataset and training:

As a result, you will produce segmented text_line elements using YOLO, which will then be transcribed using PyLaia, on all the pages you wish to process.

Optional step - Import more images to process

If you wish to also try out your newly trained models on your own images, you can import some by following this section or even upload a large image set by following this one.

Start your process

Create the process

Once all the images you want to process are available in your Arkindex project, you can create a process.

Browse to the page of the Europeana | Pellet project and click Create inference process in the Process dropdown menu.

Create a new process

Select elements

You will be redirected to a new page allowing you to filter the elements to process.

Processing folder elements from your project

The segmentation model you trained will work on page elements and detect text_line and illustration children. The transcription one will search for any text_line element appearing on the pages and create transcriptions.

Therefore, from this page, you can activate the Load children toggle and filter the elements by Page type:

Processing pages from your project, listed recursively

Once your elements are properly filtered (when processing only the Pellet dataset, 471 pages should be listed), you can proceed to workers configuration by clicking the Configure workers blue button.

Add YOLO and PyLaia workers

Press the Select workers button, search for YOLO Segmenter and press the Enter keyboard key.

Search for the YOLO Segmenter worker

Click on the name of the worker on the left and select the first version listed by clicking on the button in the Actions column.

Add the YOLO Segmenter worker to the process

We need to repeat the same steps for the PyLaia worker. Reset the worker search, input PyLaia Generic instead and press the Enter keyboard key.

Search for the PyLaia Generic worker

Click on the name of the worker on the left and select the first version listed by clicking on the button in the Actions column.

Add the PyLaia Generic worker to the process

Close the modal by clicking on the Done button on the bottom right.

Use your own models

Now it is time to select the models you trained.

Click on the button in the Model version column of the YOLO Segmenter worker. In the modal that opens:

  1. Look for the name of your trained segmentation model,
  2. Add the model version by clicking on Use in the Actions column,
  3. Close the modal by clicking on Ok, in the bottom right corner.
Add your trained YOLO model to the process

Repeat the exact same steps but for the PyLaia Generic worker, searching for the transcription model you trained this time.

Add your trained PyLaia model to the process

Configure the workers

Workers can be configured to parametrize their execution, in our case, the YOLO Segmenter already perfectly suits our needs with its default configuration. We will not have to configure it. On the contrary, the PyLaia Generic worker requires a few adjustments.

Prior, we need to find the YOLO Segmenter version UUID. To do so:

  1. Click on the YOLO Segmenter name, this will open a new tab,
Go to the YOLO Segmenter worker version
  1. From there, copy the UUID displayed next to Version just below the worker name,
Copy the worker version UUID
  1. Save the copied UUID, we will use it in a few moments,
  2. Close the tab that was just opened, you are back to the Process configuration page.

Configure the PyLaia Generic worker by clicking on the button in the Configuration column, this will open a new modal.

Select New configuration on the left column, to create a new configuration. Name it after the dataset you are using.

Configure the PyLaia Generic worker

The most important parameters are:

  • Batch size (optional): a higher value will make the inference faster but will also increase the memory usage,
  • Line element type: the default value is text_line which is already the slug of the element type we want to transcribe,
  • The worker version id that created the lines that Pylaia will be run on: this is where you have to paste the UUID you copied from the YOLO Segmenter version. It will prevent PyLaia from seeing lines other than those segmented by YOLO.

Click on Create then Select when you are done filling the fields.

Set the dependencies

As explained just above, we want the PyLaia Generic worker to process the text_line elements produced by the YOLO Segmenter worker. It means that the YOLO worker is a dependency of the PyLaia one. The segmentation step must run before the transcription.

To follow this requirement, click on the button in the Dependencies column of the PyLaia Generic worker. In the modal that opens:

  1. Add the YOLO Segmenter worker by clicking on the green + button,
  2. Close the modal by clicking on Ok, in the bottom right corner.

Run your process

Fully configured process for segmentation and transcription

Your process is now fully configured and ready to run! You can launch it using the Run process button.

While it is running, the logs of the tasks are displayed. Multiple things happen during this process:

  1. Elements to process are listed by the initialisation task.
  2. The yolo-segmenter_xxxxxx task: - browses provided elements, - segments them using your trained model, - produces text_line elements, published back to Arkindex.
  3. The pylaia_generic_xxxxxx task: - lists all text_lines from the provided elements that were segmented by the previous task, - predicts a transcription on them using your trained model, - publishes transcriptions directly on the text_line elements.

Wait for the process completion before moving to the next step.

Check the results

To see the predictions of your two models, browse back to the PELLET casimir marius folder in your project. There you can click on one of the displayed pages and you will be able to tell how well your YOLO model segmented the image.

View the segmented text lines on a page

To visualize PyLaia predictions, you can highlight a text line by selecting it from the children tree displayed on the left.

Highlight a text line element on a page

Once the text line is highlighted, its transcriptions are displayed on the right, in a dedicated section.

Display the transcriptions of a text line

If your text line has multiple transcriptions:

  • Callico is mentioned on the transcriptions annotated by humans,
  • PyLaia is mentioned on the predicted transcriptions.

As you might see, your models are working but not perfectly. Segmented elements are sometimes a bit off or zones may have simply been missed by YOLO. Transcriptions from PyLaia can approach perfection or be really off the mark. A dedicated section explains why and how to train better models using Arkindex and Teklia’s software.

Next step

Now that you have produced segmentation and transcription results in Arkindex, you may want to extract them from the platform to use elsewhere. We will explain how to export your data to PAGE XML format in the next page.