Skip to content

Train a new model

Prepare your dataset archive(s)

Training workers requires two conditions from the datasets:

  • the datasets state must be Complete (more information about the dataset states),
  • the datasets must have an archive linked.

In order for these conditions to be fulfilled, you need to process your datasets with the Training Dataset Extractor worker, in a dataset process. This is done in two steps, detailed below.

SQL export

In order to train a model in Arkindex, your first have to generate an SQLite export of the project(s) your dataset(s) belong to.

Completed project export

Generate the dataset archive(s)

When the export is ready, create a dataset process with the Training Dataset Extractor worker. This worker needs to be run over any dataset before it can be used for training. The dataset needs to be in either the Open or the Error state.

Process a dataset with the Training Dataset Extractor

While this process is ongoing, the dataset’s state will be updated to Building. Upon process completion, its state will be updated to either the Complete (success) or the Error state.

Train a Machine Learning model

You can then create a new dataset process, using the prepared datasets you need, with the actual training workers. The list of technologies we support for training in Arkindex is given in the next sections.

At the end of the training process, a new model version will be created. It will be available for download in the My Models menu, and available to workers that support models when creating workers processes.

Doc-UFCN

Doc-UFCN is a model designed to perform Document Layout Analysis (DLA). Learn more about Doc-UFCN in our blog post.

To train a new Doc-UFCN model on a dataset, create a process with the Doc-UFCN Training worker.

Learn more about how to use this worker on its description page.

Training a Doc-UFCN model

YOLO v8

YOLOv8 is the latest version of the YOLO object detection and image segmentation model. You can train YOLO models for the following tasks:

To train a new YOLO model for Object detection or Instance Segmentation, create a process with the YOLO Training | Detect/Segment worker.

Training a YOLO model for Object detection/segmentation

To train a new YOLO model for Image Classification, create a process with the YOLO Training | Classify worker.

Training a YOLO model for image classification

PyLaia

PyLaia is a model designed to perform Automatic Text Recognition (ATR).

To train a new PyLaia model on a dataset, create a process with the PyLaia Training worker.

Learn more about how to use this worker on its description page.

Training a PyLaia model