In the context of Machine Learning, a Dataset
is a collection of data. Datasets are organized in sets, which may or may not overlap. In Arkindex, datasets
are collections of elements.
Datasets
go through multiple states during their life cycle.
Open
: when a dataset
is created, it is in the Open
state. You can edit its details, the name of its sets and manage the elements included.Building
: when a worker tries to generate an archive of a dataset, it goes into the Building
state.Error
: when the worker failed to generate the dataset archive, the dataset goes into the Error
state.Complete
: when the worker succeeded in generating the archive, the dataset goes into the Complete
state. The dataset is now immutable and no element can be added or removed.If you want to use the dataset outside of Arkindex, using the API Client or the SQLite export, you do not need to change its state.
Datasets can be managed from the Datasets
tab in a project's details page.
From this interface, if you have contributor access to the project, you are able to:
Complete
state,To create a new dataset, click on the + button, on the bottom right of the dataset list. This opens a dataset creation modal.
To create a new dataset, the following fields are mandatory:
The sets field is optional; if you leave it empty, then your dataset will be created with the following default sets:
training
,validation
,test
.To edit an existing dataset, click on the pencil-shaped icon on the far right of a dataset's row, in the Actions column. This opens the same modal as the one described for dataset creation.
Edition is not available for Complete
datasets.
To view a dataset's details and its elements, click on the name of a dataset in the list. Circle through the tabs to see the elements in each set.
To remove an element from a set, use the — button in the bottom-right corner of its thumbnail.
To create a new dataset with the same collection of elements as another, you can use the Clone button, in the top-right corner of a dataset's details page. This will create a new dataset with the same elements and sets, in the Open
state. This is helpful when you need to build the v2 of a dataset from a v1 that is in the Complete
(thus immutable) state.
If you are an administrator on a project, you can delete an existing dataset from the datasets list page.
On an element's details page, there is a Datasets section listing all the datasets and sets that include the element. From this list you can remove the element from a dataset's set, if you have contributor access to the project.
These endpoints are the most useful to handle Datasets:
Once your dataset is ready, you can start training in Arkindex. Learn more about: