Train a new model
Prepare your dataset archive(s)¶
Training workers requires two conditions from the datasets:
- the datasets state must be
Complete
(more information about the dataset states), - the datasets must have an archive linked.
In order for these conditions to be fulfilled, you need to process your datasets with the Training Dataset Extractor worker, in a dataset process. This is done in two steps, detailed below.
SQL export¶
In order to train a model in Arkindex, your first have to generate an SQLite export of the project(s) your dataset(s) belong to.
Generate the dataset archive(s)¶
When the export is ready, create a dataset process with the Training Dataset Extractor worker. This worker needs to be run over any dataset before it can be used for training. The dataset needs to be in either the Open
or the Error
state.
While this process is ongoing, the dataset’s state will be updated to Building
. Upon process completion, its state will be updated to either the Complete
(success) or the Error
state.
Train a Machine Learning model¶
You can then create a new dataset process, using the prepared datasets you need, with the actual training workers. The list of technologies we support for training in Arkindex is given in the next sections.
At the end of the training process, a new model version will be created. It will be available for download in the My Models menu, and available to workers that support models when creating workers processes.
Doc-UFCN¶
Doc-UFCN is a model designed to perform Document Layout Analysis (DLA). Learn more about Doc-UFCN in our blog post.
To train a new Doc-UFCN model on a dataset, create a process with the Doc-UFCN Training worker.
Learn more about how to use this worker on its description page.
YOLO v8¶
YOLOv8 is the latest version of the YOLO object detection and image segmentation model. You can train YOLO models for the following tasks:
To train a new YOLO model for Object detection or Instance Segmentation, create a process with the YOLO Training | Detect/Segment worker.
To train a new YOLO model for Image Classification, create a process with the YOLO Training | Classify worker.
PyLaia¶
PyLaia is a model designed to perform Automatic Text Recognition (ATR).
To train a new PyLaia model on a dataset, create a process with the PyLaia Training worker.
Learn more about how to use this worker on its description page.