In this tutorial, you will learn how to run the models you trained using Arkindex.
This section is to be followed and carried out after annotating the Pellet dataset and training:
As a result, you will produce segmented text_line
elements using YOLO, which will then be transcribed using PyLaia, on all the pages you wish to process.
If you wish to also try out your newly trained models on your own images, you can import some by following this section or even upload a large image set by following this one.
Once all the images you want to process are available in your Arkindex project, you can create a process.
Browse to the page of the Europeana | Pellet
project and click Create process in the Actions dropdown menu.
You will be redirected to a new page allowing you to filter the elements to process.
The segmentation model you trained will work on page
elements and detect text_line
and illustration
children. The transcription one will search for any text_line
element appearing on the pages and create transcriptions.
Therefore, from this page, you can activate the Load children toggle and filter the elements by Page
type:
Once your elements are properly filtered (when processing only the Pellet dataset, 471 pages should be listed), you can proceed to workers configuration by clicking the Configure workers blue button.
Press the Select workers button, search for YOLO V8 Segmenter
and press the Enter keyboard key.
Click on the name of the worker on the left and select the first version listed by clicking on the button in the Actions column.
We need to repeat the same steps for the PyLaia worker. Reset the worker search, input PyLaia Generic
instead and press the Enter keyboard key.
Click on the name of the worker on the left and select the first version listed by clicking on the button in the Actions column.
Close the modal by clicking on the Done button on the bottom right.
Now it is time to select the models you trained.
Click on the button in the Model version column of the YOLO V8 Segmenter
worker. In the modal that opens:
Repeat the exact same steps but for the PyLaia Generic
worker, searching for the transcription model you trained this time.
Workers can be configured to parametrize their execution, in our case, the YOLO V8 Segmenter
already perfectly suits our needs with its default configuration. We will not have to configure it. On the contrary, the PyLaia Generic
worker requires a few adjustments.
Prior, we need to find the YOLO V8 Segmenter
version UUID. To do so:
YOLO V8 Segmenter
name, this will open a new tab,Configure the PyLaia Generic
worker by clicking on the button in the Configuration column, this will open a new modal.
Select New configuration on the left column, to create a new configuration. Name it after the dataset you are using.
The most important parameters are:
text_line
which is already the slug of the element type we want to transcribe,YOLO V8 Segmenter
version. It will prevent PyLaia from seeing lines other than those segmented by YOLO,Click on Create then Select when you are done filling the fields.
As explained just above, we want the PyLaia Generic
worker to process the text_line
elements produced by the YOLO V8 Segmenter
worker. It means that the YOLO worker is a dependency of the PyLaia one. The segmentation step must run before the transcription.
To follow this requirement, click on the button in the Dependencies column of the PyLaia Generic
worker. In the modal that opens:
YOLO V8 Segmenter
worker by clicking on the green + button,Your process is now fully configured and ready to run! You can launch it using the Run process button.
While it is running, the logs of the tasks are displayed. Multiple things happen during this process:
initialisation
task.yolo-segmenter_xxxxxx
task:
text_line
elements, published back to Arkindex.pylaia_generic_xxxxxx
task:
text_lines
from the provided elements that were segmented by the previous task,text_line
elements.Wait for the process completion before moving to the next step.
To see the predictions of your two models, browse back to the PELLET casimir marius
folder in your project. There you can click on one of the displayed pages and you will be able to tell how well your YOLO model segmented the image.
To visualize PyLaia predictions, you can highlight a text line by selecting it from the children tree displayed on the left.
Once the text line is highlighted, its transcriptions are displayed on the right, in a dedicated section.
If your text line has multiple transcriptions:
As you might see, your models are working but not perfectly. Segmented elements are sometimes a bit off or zones may have simply be missed by YOLO. Transcriptions from PyLaia can approach perfection or be really off the mark. A dedicated section explains why and how to train better models using Arkindex and Teklia's products.
Now that you have produced segmentation and transcription results in Arkindex, you may want to extract them from the platform to use elsewhere. We will explain how to export your data to PAGE XML format in the next page.