Create your own worker

    This page will guide you through creating a new Arkindex worker locally and preparing a development environment.

    This guide assumes you are using Ubuntu 18.04 or later and have root access.

    Preparing your environment🔗

    This section will guide you through preparing your system to create a new Arkindex worker from our official template.

    Installing system dependencies🔗

    To retrieve the Arkindex worker template, you will need to have both Git and SSH. Git is a version control system that you will later use to manage multiple versions of your worker. SSH allows secure connections to remote machines, and will be used in our case to retrieve the template from a Git server.

    To install system dependencies🔗

    1. Run the following command:

      sudo apt install git ssh
      

    Checking your version of Python🔗

    Our Arkindex worker template requires Python 3.6 or later. Checking if a compatible version of Python is installed avoids further issues in the setup process.

    To check your version of Python🔗

    1. Run the following command: python3 --version

    This command will have an output similar to the following:

    Python 3.6.9
    

    Installing Python🔗

    If you were unable to check your Python version as stated above because python3 was not found, you will need to install Python 3 on your system.

    To install Python on Ubuntu🔗

    1. Run the following command:

      sudo apt install python3 python3-pip python3-virtualenv
      
    2. Check your Python version again, as instructed in the previous section.

    Installing Python dependencies🔗

    To bootstrap a new Arkindex worker, some Python dependencies will be required:

    • pre-commit will be used to automatically check the syntax of your source code.
    • tox will be used to run unit tests.

    To install Python dependencies🔗

    1. Run the following command:

      pip3 install pre-commit tox cookiecutter virtualenvwrapper
      
    2. Follow the official virtualenvwrapper setup instructions until you are able to run workon.

    workon should have an empty output, as no Python virtual environments have been set up yet.

    Creating the project🔗

    This section will guide you through creating a new worker from our official template and making it available on a GitLab instance.

    Creating a GitLab project🔗

    For a worker to be accessible from an Arkindex instance, it needs to be sent to a repository on a GitLab project. A GitLab project will also allow you to manage different versions of a worker and run automated checks on your code.

    To create a GitLab project🔗

    1. Open the New project form on GitLab.com or on another GitLab instance

    2. Enter your worker name as the Project name

    3. Define a Project slug related to your worker, e.g.:

      • tesseract for a Tesseract worker
      • opencv-foo for an OpenCV worker related to project Foo
    4. Click on the Create project button

    Bootstrapping the project🔗

    This section guides you through using our official template to get a basic structure for your worker.

    To bootstrap the project🔗

    1. Open a terminal and go to a folder in which you will want your worker to be.

    2. Enter this command and fill in the required information:

      cookiecutter git@gitlab.com:arkindex/base-worker.git
      

    Cookiecutter will ask you for several options:

    • slug: A slug for the worker. This should use lowercase alphanumeric characters or underscores to meet the code formatting requirements that the template automatically enforces via black.

    • worker_type: An arbitrary string purely used for display purposes. For example:

      • recognizer,

      • classifier,

      • dla,

      • entity-recognizer, etc.

    • author: A name for the worker's author. Usually your first and last name.

    • email: Your e-mail address. This will be used to contact you if any administrative need arise

    Pushing to GitLab🔗

    This section guides you through pushing the newly created worker from your system to the GitLab project's repository.

    This section assumes you have Maintainer or Owner access to the GitLab project.

    To push to GitLab🔗

    1. Enter the newly created directory, starting in worker- and ending with your worker's slug.

    2. Add your GitLab project as a Git remote:

      git remote add origin git@my-gitlab-instance.com:path/to/worker.git
      

      You will need to use your own instance's URL and the path to your own project. For example, a project named hello in the teklia group on gitlab.com will use the following command:

      git remote add origin git@gitlab.com:teklia/hello.git
      
    3. Push the new branch to GitLab:

      git push --set-upstream origin master
      
    4. Open your GitLab project in a browser.

    5. Click on the blue icon indicating that CI is running on your repository, and wait for it to turn green to confirm everything worked.

    Setting up your development environment🔗

    This section guides you through setting up a Python development environment specifically for your worker.

    Activating the pre-commit hook🔗

    The official template includes code syntax checks such as trailing whitespace, as well as code linting using black. Those checks run on GitLab as soon as you push new code, but it is possible to run those automatically when you create new commits using the pre-commit hook.

    To activate the pre-commit hook🔗

    1. Run pre-commit install.

    Setting up the Python virtual environment🔗

    To install Python dependencies that are specific to your worker, and prevent other dependencies installed on your system from interfering, it is recommended to use a virtual environment.

    To set up a Python virtual environment🔗

    1. Run mkvirtualenv my_worker, where my_worker is any name of your choice.
    2. Install your worker in editable mode: pip install -e .