Running your models in production

    In this tutorial, you will learn how to run the models you trained using Arkindex.

    This section is to be followed and carried out after annotating the Pellet dataset and training:

    As a result, you will produce segmented text_line elements using YOLO, which will then be transcribed using PyLaia, on all the pages you wish to process.

    Optional step - Import more images to process🔗

    If you wish to also try out your newly trained models on your own images, you can import some by following this section or even upload a large image set by following this one.

    Start your process🔗

    Create the process🔗

    Once all the images you want to process are available in your Arkindex project, you can create a process.

    Browse to the page of the Europeana | Pellet project and click Create process in the Actions dropdown menu.

    Create a new process
    Create a new process

    Select elements🔗

    You will be redirected to a new page allowing you to filter the elements to process.

    Processing folder elements from your project
    Processing folder elements from your project

    The segmentation model you trained will work on page elements and detect text_line and illustration children. The transcription one will search for any text_line element appearing on the pages and create transcriptions.

    Therefore, from this page, you can activate the Load children toggle and filter the elements by Page type:

    Processing pages from your project, listed recursively
    Processing pages from your project, listed recursively

    Once your elements are properly filtered (when processing only the Pellet dataset, 471 pages should be listed), you can proceed to workers configuration by clicking the Configure workers blue button.

    Add YOLO and PyLaia workers🔗

    Press the Select workers button, search for YOLO V8 Segmenter and press the Enter keyboard key.

    Search for the YOLO V8 Segmenter worker
    Search for the YOLO V8 Segmenter worker

    Click on the name of the worker on the left and select the first version listed by clicking on the button in the Actions column.

    Add the YOLO V8 Segmenter worker to the process
    Add the YOLO V8 Segmenter worker to the process

    We need to repeat the same steps for the PyLaia worker. Reset the worker search, input PyLaia Generic instead and press the Enter keyboard key.

    Search for the PyLaia Generic worker
    Search for the PyLaia Generic worker

    Click on the name of the worker on the left and select the first version listed by clicking on the button in the Actions column.

    Add the PyLaia Generic worker to the process
    Add the PyLaia Generic worker to the process

    Close the modal by clicking on the Done button on the bottom right.

    Use your own models🔗

    Now it is time to select the models you trained.

    Click on the button in the Model version column of the YOLO V8 Segmenter worker. In the modal that opens:

    1. Look for the name of your trained segmentation model,
    2. Add the model version by clicking on Use in the Actions column,
    3. Close the modal by clicking on Ok, in the bottom right corner.
    Add your trained YOLO model to the process
    Add your trained YOLO model to the process

    Repeat the exact same steps but for the PyLaia Generic worker, searching for the transcription model you trained this time.

    Add your trained PyLaia model to the process
    Add your trained PyLaia model to the process

    Configure the workers🔗

    Workers can be configured to parametrize their execution, in our case, the YOLO V8 Segmenter already perfectly suits our needs with its default configuration. We will not have to configure it. On the contrary, the PyLaia Generic worker requires a few adjustments.

    Prior, we need to find the YOLO V8 Segmenter version UUID. To do so:

    1. Click on the YOLO V8 Segmenter name, this will open a new tab,
    Go to the YOLO V8 Segmenter worker version
    Go to the YOLO V8 Segmenter worker version
    1. From there, copy the UUID displayed next to Version just below the worker name,
    Copy the worker version UUID
    Copy the worker version UUID
    1. Save the copied UUID, we will use it in a few,
    2. Close the tab that was just opened, you are back to the Process configuration page.

    Configure the PyLaia Generic worker by clicking on the button in the Configuration column, this will open a new modal.

    Select New configuration on the left column, to create a new configuration. Name it after the dataset you are using.

    Configure the PyLaia Generic worker
    Configure the PyLaia Generic worker

    The most important parameters are:

    • Batch size (optional): a higher value will make the inference faster but will also increase the memory usage.
    • Line element type: the default value is text_line which is already the slug of the element type we want to transcribe,
    • The worker version id that created the lines that Pylaia will be run on: this is where you have to paste the UUID you copied from the YOLO V8 Segmenter version. It will prevent PyLaia from seeing lines other than those segmented by YOLO,

    Click on Create then Select when you are done filling the fields.

    Set the dependencies🔗

    As explained just above, we want the PyLaia Generic worker to process the text_line elements produced by the YOLO V8 Segmenter worker. It means that the YOLO worker is a dependency of the PyLaia one. The segmentation step must run before the transcription.

    To follow this requirement, click on the button in the Dependencies column of the PyLaia Generic worker. In the modal that opens:

    1. Add the YOLO V8 Segmenter worker by clicking on the green + button,
    2. Close the modal by clicking on Ok, in the bottom right corner.

    Run your process🔗

    Fully configured process for segmentation and transcription
    Fully configured process for segmentation and transcription

    Your process is now fully configured and ready to run! You can launch it using the Run process button.

    While it is running, the logs of the tasks are displayed. Multiple things happen during this process:

    1. Elements to process are listed by the initialisation task.
    2. The yolo-segmenter_xxxxxx task:
      • browses provided elements,
      • segments them using your trained model,
      • produces text_line elements, published back to Arkindex.
    3. The pylaia_generic_xxxxxx task:
      • lists all text_lines from the provided elements that were segmented by the previous task,
      • predicts a transcription on them using your trained model,
      • publishes transcriptions directly on the text_line elements.

    Wait for the process completion before moving to the next step.

    Check the results🔗

    To see the predictions of your two models, browse back to the PELLET casimir marius folder in your project. There you can click on one of the displayed pages and you will be able to tell how well your YOLO model segmented the image.

    View the segmented text lines on a page
    View the segmented text lines on a page

    To visualize PyLaia predictions, you can highlight a text line by selecting it from the children tree displayed on the left.

    Highlight a text line element on a page
    Highlight a text line element on a page

    Once the text line is highlighted, its transcriptions are displayed on the right, in a dedicated section.

    Display the transcriptions of a text line
    Display the transcriptions of a text line

    If your text line has multiple transcriptions:

    • Callico is mentioned on the transcriptions annotated by humans,
    • PyLaia is mentioned on the predicted transcriptions.

    As you might see, your models are working but not perfectly. Segmented elements are sometimes a bit off or zones may have simply be missed by YOLO. Transcriptions from PyLaia can approach perfection or be really off the mark. A dedicated section explains why and how to train better models using Arkindex and Teklia's products.

    Next step🔗

    Now that you have produced segmentation and transcription results in Arkindex, you may want to extract them from the platform to use elsewhere. We will explain how to export your data to PAGE XML format in the next page.