Lexicon
This page contains a glossary for the technical terms used in this tutorial. The defined words are organized in alphabetical order.
Agent¶
An agent designates a specific Arkindex concept, which is an instance of Ponos, our proprietary software that executes workers linked to intensive document processing tasks.
ALTO XML¶
Analysed Layout and Text Object (ALTO) is an XML standard allowing to encode digitized documents by organizing and structuring a page and its contents. It is similar to the PAGE XML format. It most commonly serves as an extension schema used within a METS section but can also exist as a standalone document.
Learn more by reading the ALTO XML representation.
Confidence score¶
The confidence score is a measure provided by the model, for each prediction, that reflects the model’s certainty. Confidence scores help to understand how much trust can be placed in a given prediction. They are numeric values, between 0 and 1, often expressed as a percentage. A high confidence score indicates that the model is more confident about its prediction.
Confidence scores and evaluation scores are often correlated. On unannotated data, we must rely on the confidence score since there is no ground truth available.
CPU/GPU¶
A CPU, or central processing unit, is a hardware component that is the core computational unit in a server. It handles all types of computing tasks required for the operating system and applications to run. A graphics processing unit (GPU) is a similar hardware component but more specialized and performant for demanding tasks, such as training Machine Learning models.
Data leakage¶
In Machine Learning, data leakage is the use of information in the model training process which would not be expected to be available at prediction time, causing the predictive scores (metrics) to overestimate the model’s utility when run in a production environment.
Epoch¶
An epoch corresponds to one complete pass of the training dataset through the algorithm. The performance of the model generally increases with the number of epochs. However, the model will eventually stop learning at some point so specifying a very high number might waste time.
Evaluation score¶
Evaluation scores measure the actual performance of a model by comparing how close its predictions are to the ground truth annotations. These scores are used to evaluate how well the model performs on a given task. They are computed on a test set, which is a separate set of data not used during training or validation. This helps ensure that the score is unbiased and reflects the model’s performance on unseen data. They are numeric values, between 0 and 1, often expressed as a percentage.
Common document analysis and recognition metrics include:
- The mean Average Precision (mAP), which is used to evaluate detection models such as YOLO. mAP measures how close the model’s detected polygons and their classes match the ground truth. A higher mAP value indicates better performance.
- Character Error Rate (CER) and Word Error Rate (WER) are used to evaluate Automatic Text Recognition models such as PyLaia. CER and WER measure the percentage of incorrect characters or words in the model’s predictions compared to the ground truth. Lower CER and WER values indicate better performance.
Evaluation scores and confidence scores are often correlated. On unannotated data, we must rely on the confidence score since there is no ground truth available.
Farm¶
A farm is also a specific Arkindex concept. It refers to a group of computing resources on which a Ponos agent is available to run workers.
Fine-tuning¶
Fine-tuning is a process in machine learning where a pre-trained model is further trained on a new, typically smaller, dataset to adapt it to a specific task. This involves adjusting the weights of the model’s layers to leverage existing knowledge while refining it for the new application.
Frozen layer¶
A frozen layer in a neural network is a layer whose parameters (weights and biases) are not updated during the training process. This technique is often used in transfer learning to preserve previously learned features while focusing training efforts on other layers that are more relevant to the new task.
HTR¶
Handwritten Text Recognition (HTR) is the ability of a computer to take as input handwriting from sources such as printed physical documents, pictures and other devices, and then interpret this as text.
Layer¶
A layer in a neural network is a fundamental building block consisting of neurons that process input data by applying weights, biases, and activation functions to produce an output. Layers are stacked sequentially in a way that enables the network to learn complex patterns and make accurate predictions.
Markdown¶
Markdown is a plain text format for writing structured documents, based on conventions which indicate the formatting. It is widely used for blogging, instant messaging, in collaborative software, etc. It easily enhances readability and allows formatting text as titles, subtitles, lists, links and so on.
Learn more by reading the Markdown specification.
METS¶
Metadata Encoding and Transmission Standard (METS) is a metadata standard allowing to encode descriptive, administrative and structural metadata related to digitized documents. Those documents are expressed using an XML standard such as ALTO.
Learn more by reading the METS representation.
Model¶
A Machine Learning model is a program that has been trained to find patterns or make decisions from a previously unseen dataset.
For example, in Natural Language Processing (NLP), Machine Learning models can parse and correctly recognize the intent behind previously unheard sentences or combinations of words. In image recognition, a Machine Learning model can be taught to recognize objects, such as cats or dogs.
Overfitting¶
Overfitting occurs when a machine learning model learns the details and noise in the training data to the extent that it negatively impacts the model’s performance on unseen data. This means the model performs exceptionally well on the training data but poorly on the validation and test data because it has essentially memorized the training data rather than learning generalizable patterns.
PAGE XML¶
Page Analysis and Ground Truth Elements (PAGE) is an XML standard allowing to encode digitized documents by organizing and structuring a page and its contents. It is similar to the ALTO XML format. In this tutorial, our exporter outputs a representation of the paragraphs and lines on an image along with their position.
Learn more by reading the PAGE XML representation.
Slug¶
Slug is a term from newspaper language. It is a string that can only include characters, numbers, dashes, and underscores. It is a unique identifier that refers to a single object, in a human-friendly form.
SQLite¶
SQLite is a library that implements a transactional SQL database engine. In Arkindex, it allows to export projects and all the elements they contain to a single lightweight file. This generated file can be stored, shared and easily accessed by some workers to perform demanding operations on a large number of elements.
In addition, generating an SQLite export in Arkindex could allow you to restore the current state of your project, in case you accidentally delete images or transcriptions.
Learn more by reading the SQLite documentation.
Training¶
Model training in Machine Learning is the process of feeding a Machine Learning algorithm with data to help identify and learn good values for all of its parameters.
Worker¶
A worker is a resource required to run document processing workflows. It is programmed to apply a specialized action to one element at a time, which will produce the desired output. Workers can be chained to perform several successive actions on elements before reporting the results to Arkindex.
For example, various dedicated workers from Arkindex have been developed with the aim of:
- transcribing text from an image,
- translating transcriptions available on Arkindex into another language,
- recognizing objects in an image (cats, dogs and so on),
- etc.
Worker configuration¶
Configuring a worker is the step that allows you to parameterize its execution to adapt the program’s input (elements) or output (results) to your needs.