Train a Machine Learning model

    You can use Arkindex to train machine learning models for Arkindex's workers, using annotated data from any Arkindex project you have access to. There must be within this project at least one folder: the folder containing the training data.

    You can also use optional validation and test folders.

    To start a training process for a given Model, on a given Project, you need a contributor access to the Project, and an admin access to the Model.

    The training interface can be accessed from the Actions dropdown menu on the right of the header of a project.

    'Train a model' in the Actions menu
    'Train a model' in the Actions menu

    The training interface🔗

    In order to train a Machine Learning model, you have to set a number of parameters in the training process configuration form.

    The training process configuration form
    The training process configuration form

    Naming your training process🔗

    First, you have to name your training process. This will be useful to find it again in the processes list, if you navigate away from the process status page.

    Selecting a worker version🔗

    Then, you need to select the worker version that will perform the training. For example, if you want to train a model for Doc-UFCN, you need to select the latest available version for this worker in the worker version selection modal. The trained model, once it's finished training, will be available to be used in Machine Learning processes using this worker.

    Worker version selection
    Worker version selection

    Configuring the training process🔗

    You can (optionnally) add a training configuration to your training process. You can either select an existing configuration, or create a new configuration, using the configuration modal.

    Training configuration
    Training configuration

    Selecting a model to train🔗

    You have to select the model you will be training, among the available models. You can also, optionally, select a model version to start your training from.

    Model selection modal
    Model selection modal

    Training, validation and test folders🔗

    You have to select a training folder, containing the data you want to train your model on, from the existing folders in the corpus you've chosen to train a model on. You can also select a validation and a test folder.

    Folder picker modal
    Folder picker modal

    The data contained in the training and validation folders is used to train the model, while the data contained is the test folder is never used during the traing process, and only serves to test it on totally new data to evaluate its performance.

    GPU usage🔗

    Lastly, you can chose to use GPU or not to train your model, using the GPU toggle.

    You can then click the Start training button.

    Training Process Status🔗

    This takes you to a process status page, similar to the one with which you can follow the process of a Workers workflow. You can leave this status page, and find it again in the Processes List (/process). This list can be filtered with various parameters, including the process name, so you can easily find your training process again and monitor it.

    Once the training has been successfully completed, you new model is available to use in Machine Learning Processes.