Skip to content

Export predicted results in PAGE XML format

In this tutorial, you will learn how to export the results you predicted, using your own segmentation (YOLO) and transcription (PyLaia) models, in the PAGE XML format.

This section is to be followed and carried out after running both your models in production.

Generate an export of your project

First, you have to know that the PAGE XML exporter will work on an SQLite export to avoid overloading Arkindex instances. This means that we have to generate a fresh one, containing your recently predicted results, as you have already done before.

Browse to the page of the Europeana | Pellet project. Instructions to start an SQLite export are detailed in our guide to export a project. Once started, read how to monitor the status of your export.

Wait for your export to be Done before going further in this tutorial.

Create your export process

Now that a fresh SQLite export of your project is available, we can create the export process.

From the root of your project, click on Create inference process in the Processes menu.

Create a process on your whole project

This time, you do not need to filter elements as the PAGE XML exporter will look recursively for pages in your project.

Filter the elements to process to list top folders

Click on Configure workers to move on to worker selection. Press the Select workers button, search for PAGE XML export and press the Enter keyboard key. Just like we did in the previous sections, click on the name of the worker on the left and select the first version listed by clicking on the button in the Actions column.

Add the PAGE XML export worker to the process

Close the modal by clicking on the Done button on the bottom right.

The process is ready and you can launch it using the Run process button. Wait for its completion before moving to the next step.

Retrieve the generated PAGE XML files

The generated PAGE XML files will be made available as an Arkindex artifact when the process is finished. To download them, click on the Artifacts button to list artifacts and click on pagexml.tar.zst.

Info

This type of archive is not natively supported by Windows. In that case, you will need an external archive manager tool like 7zip.

Download the artifact containing generated PAGE XML files

This archive contains one directory named using the UUID of the PELLET casimir marius Arkindex folder. In this directory, you will find as many files as you had pages in the exported folder.

Here is an example PAGE XML file generated for the page displayed in this section:

<PcGts xmlns="http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15 http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15/pagecontent.xsd">
    <Metadata>
        <Creator>Arkindex CLI</Creator>
        <Created>2024-07-25T17:20:38</Created>
        <LastChange>2024-07-25T17:20:38</LastChange>
    </Metadata>
    <Page imageFilename="1258737.jpg" imageWidth="832" imageHeight="1250">
        <TextRegion id="text_line-21" type="paragraph" orientation="0" readingDirection="left-to-right" textLineOrder="top-to-bottom">
            <Coords points="209,27 209,111 574,111 574,27" />
            <TextEquiv conf="1.0">
                <Unicode>Le 28 Juin 1919</Unicode>
            </TextEquiv>
        </TextRegion>
        <TextRegion id="text_line-19" type="paragraph" orientation="0" readingDirection="left-to-right" textLineOrder="top-to-bottom">
            <Coords points="193,184 193,275 613,275 613,184" />
            <TextEquiv conf="1.0">
                <Unicode>Bien eher Cousin</Unicode>
            </TextEquiv>
        </TextRegion>
        <TextRegion id="text_line-23" type="paragraph" orientation="0" readingDirection="left-to-right" textLineOrder="top-to-bottom">
            <Coords points="193,215 193,268 605,268 605,215" />
            <TextEquiv conf="1.0">
                <Unicode>Bien cher Cousin</Unicode>
            </TextEquiv>
        </TextRegion>
        <TextRegion id="text_line-18" type="paragraph" orientation="0" readingDirection="left-to-right" textLineOrder="top-to-bottom">
            <Coords points="256,332 256,377 801,377 801,332" />
            <TextEquiv conf="1.0">
                <Unicode>Je m'empresse de fuire réponse</Unicode>
            </TextEquiv>
        </TextRegion>
        <TextRegion id="text_line-8" type="paragraph" orientation="0" readingDirection="left-to-right" textLineOrder="top-to-bottom">
            <Coords points="225,371 225,432 781,432 781,420" />
            <TextEquiv conf="1.0">
                <Unicode>a carte lettre que je viens de</Unicode>
            </TextEquiv>
        </TextRegion>
        <TextRegion id="text_line-6" type="paragraph" orientation="0" readingDirection="left-to-right" textLineOrder="top-to-bottom">
            <Coords points="123,418 123,486 777,486 777,418" />
            <TextEquiv conf="1.0">
                <Unicode>recevoir u l'instant dutéi du 24</Unicode>
            </TextEquiv>
        </TextRegion>
        <TextRegion id="text_line-13" type="paragraph" orientation="0" readingDirection="left-to-right" textLineOrder="top-to-bottom">
            <Coords points="131,473 131,533 760,533 760,525" />
            <TextEquiv conf="1.0">
                <Unicode>qui m'a bien fait plaisir de te</Unicode>
            </TextEquiv>
        </TextRegion>
        <TextRegion id="text_line-1" type="paragraph" orientation="0" readingDirection="left-to-right" textLineOrder="top-to-bottom">
            <Coords points="123,520 123,580 777,580 777,549" />
            <TextEquiv conf="1.0">
                <Unicode>savvir toujours en bonne santé</Unicode>
            </TextEquiv>
        </TextRegion>
        <TextRegion id="text_line-17" type="paragraph" orientation="0" readingDirection="left-to-right" textLineOrder="top-to-bottom">
            <Coords points="123,566 123,627 785,627 785,566" />
            <TextEquiv conf="1.0">
                <Unicode>et aussi que tu sois toujous à ton</Unicode>
            </TextEquiv>
        </TextRegion>
        <TextRegion id="text_line-4" type="paragraph" orientation="0" readingDirection="left-to-right" textLineOrder="top-to-bottom">
            <Coords points="115,613 115,674 793,674 793,613" />
            <TextEquiv conf="1.0">
                <Unicode>dipst. Cu tu ne crainds rien pou</Unicode>
            </TextEquiv>
        </TextRegion>
        <TextRegion id="text_line-14" type="paragraph" orientation="0" readingDirection="left-to-right" textLineOrder="top-to-bottom">
            <Coords points="115,660 115,721 785,721 785,660" />
            <TextEquiv conf="1.0">
                <Unicode>le moment des balles ni des cus</Unicode>
            </TextEquiv>
        </TextRegion>
        <TextRegion id="text_line-3" type="paragraph" orientation="0" readingDirection="left-to-right" textLineOrder="top-to-bottom">
            <Coords points="115,707 115,768 777,768 777,732" />
            <TextEquiv conf="1.0">
                <Unicode>de ces messieurs les sales boctes.</Unicode>
            </TextEquiv>
        </TextRegion>
        <TextRegion id="text_line-20" type="paragraph" orientation="0" readingDirection="left-to-right" textLineOrder="top-to-bottom">
            <Coords points="100,754 100,814 785,814 785,754" />
            <TextEquiv conf="1.0">
                <Unicode>Et je soutaite aque tu reste le ple</Unicode>
            </TextEquiv>
        </TextRegion>
        <TextRegion id="text_line-16" type="paragraph" orientation="0" readingDirection="left-to-right" textLineOrder="top-to-bottom">
            <Coords points="115,762 115,807 770,807 770,762" />
            <TextEquiv conf="1.0">
                <Unicode>Et je soutaite yue tue reste le pu</Unicode>
            </TextEquiv>
        </TextRegion>
        <TextRegion id="text_line-2" type="paragraph" orientation="0" readingDirection="left-to-right" textLineOrder="top-to-bottom">
            <Coords points="107,801 107,861 770,861 770,801" />
            <TextEquiv conf="1.0">
                <Unicode>passible a ton dipot. Cur cu</Unicode>
            </TextEquiv>
        </TextRegion>
        <TextRegion id="text_line-11" type="paragraph" orientation="0" readingDirection="left-to-right" textLineOrder="top-to-bottom">
            <Coords points="107,855 107,908 312,908 312,902" />
            <TextEquiv conf="1.0">
                <Unicode>n'a toupous rien de bien qai la</Unicode>
            </TextEquiv>
        </TextRegion>
        <TextRegion id="text_line-9" type="paragraph" orientation="0" readingDirection="left-to-right" textLineOrder="top-to-bottom">
            <Coords points="107,895 107,955 777,955 777,895" />
            <TextEquiv conf="1.0">
                <Unicode>yuerre de trunchéis.a</Unicode>
            </TextEquiv>
        </TextRegion>
        <TextRegion id="text_line-22" type="paragraph" orientation="0" readingDirection="left-to-right" textLineOrder="top-to-bottom">
            <Coords points="100,934 100,1010 793,1010 793,934" />
            <TextEquiv conf="1.0">
                <Unicode>Guand a moi bien cher Cousin</Unicode>
            </TextEquiv>
        </TextRegion>
        <TextRegion id="text_line-15" type="paragraph" orientation="0" readingDirection="left-to-right" textLineOrder="top-to-bottom">
            <Coords points="115,949 115,994 785,994 785,949" />
            <TextEquiv conf="1.0">
                <Unicode>Guand er moi bien cher Cousis</Unicode>
            </TextEquiv>
        </TextRegion>
        <TextRegion id="text_line-5" type="paragraph" orientation="0" readingDirection="left-to-right" textLineOrder="top-to-bottom">
            <Coords points="115,988 115,1049 773,1049 773,1041" />
            <TextEquiv conf="1.0">
                <Unicode>je me porte touprurs bien et je</Unicode>
            </TextEquiv>
        </TextRegion>
        <TextRegion id="text_line-24" type="paragraph" orientation="0" readingDirection="left-to-right" textLineOrder="top-to-bottom">
            <Coords points="115,1035 115,1104 785,1104 785,1057" />
            <TextEquiv conf="1.0">
                <Unicode>pense que ca continun.</Unicode>
            </TextEquiv>
        </TextRegion>
        <TextRegion id="text_line-12" type="paragraph" orientation="0" readingDirection="left-to-right" textLineOrder="top-to-bottom">
            <Coords points="115,1043 115,1096 723,1096 723,1043" />
            <TextEquiv conf="1.0">
                <Unicode>pense que ca continun.</Unicode>
            </TextEquiv>
        </TextRegion>
        <TextRegion id="text_line-7" type="paragraph" orientation="0" readingDirection="left-to-right" textLineOrder="top-to-bottom">
            <Coords points="115,1090 115,1150 770,1150 770,1090" />
            <TextEquiv conf="1.0">
                <Unicode>Je te dinais que voil deuse jou</Unicode>
            </TextEquiv>
        </TextRegion>
        <TextRegion id="text_line-10" type="paragraph" orientation="0" readingDirection="left-to-right" textLineOrder="top-to-bottom">
            <Coords points="107,1137 107,1189 770,1189 770,1137" />
            <TextEquiv conf="1.0">
                <Unicode>it fiit que pleuvoir, et ce n'est</Unicode>
            </TextEquiv>
        </TextRegion>
    </Page>
</PcGts>

Next step

This tutorial is now complete. For a summary of what you have learned and additional tips on how to enhance your models, check out the conclusion page.