YAML configuration

    This page is a reference for version 2 of the YAML configuration file for Git repositories handled by Arkindex. Version 1 is not supported.

    The configuration file is always named .arkindex.yml and should be found at the root of the repository.

    Required attributes🔗

    The following attributes are required in every .arkindex.yml file:

    • version: Version of the configuration file in use. An error will occur if the version number is not set to 2.
    • workers: A list of workers attached to the Git repository.

    The workers attribute is a list of the following:

    • Paths to a YAML file holding the configuration for a single worker
    • Unix-style patterns matching paths to YAML files holding the configuration for a single worker
    • The configuration of a single worker embedded directly into the file

    Single worker configuration🔗

    The following describes the attributes of a YAML file configuring one worker, or of the configuration embedded directly in the .arkindex.yml file.

    All attributes are optional unless explicitly specified.

    • name: Mandatory. Name of the worker, for display purposes.

    • slug: Mandatory. Slug of this worker. The slug must be unique across the repository and must only hold alphanumerical characters, underscores or dashes.

    • type: Mandatory. Type of the worker, for display purposes only. Some common values include:

      • classifier
      • recognizer
      • ner
      • dla
      • word-segmenter
      • paragraph-creator
    • docker: Regroups Docker-related configuration attributes:

      • build: Path towards a Dockerfile used to build this worker, relative to the root of the repository. Defaults to Dockerfile.
      • command: Custom command line to be used when launching the Docker container for this Worker. By default, the command specified in the Dockerfile will be used.
    • environment: Mapping of string keys and string values to define environment variables to be set when the Docker image runs.

    • configuration: Mapping holding any string keys and values that can be later accessed in the worker's Python code. Can be used to define settings on your own worker, such as a file's location.

    • user_configuration: Mapping defining settings on your worker that can be modified by users. See below for details.

    • secrets: List of required secret names for that specific worker.

    For more information, see Using secrets in workers.

    Setting up user-configurable parameters🔗

    The YAML file can define parameters that users will be able to change when they use this worker in a process on Arkindex. These parameters are listed in a user_configuration attribute.

    A parameter is defined using the following settings:

    • title: mandatory. The parameter's title.
    • type: mandatory. A value type. The supported types are:
      • int
      • bool
      • float
      • string
      • enum
      • list
      • dict
    • default: optional. A default value for the parameter. Must be of the defined parameter type.
    • required: optional. A boolean, defaults to false.
    • choices: optional. Required for and usable with enum type parameters only.
    • subtype: optional. Required for and usable with list type parameters only.

    This definition allows for both validation of the input and the display of a form to make configuring workers easy for Arkindex users.

    User configuration form on Arkindex
    User configuration form on Arkindex

    String parameters🔗

    String-type parameters must be defined using a title and the string type. You can also set a default value for this parameter, which must be a string, as well as make it a required parameter, which prevents users from leaving it blank.

    For example, a string-type parameter can be defined like this:

    subfolder_name:
      title: Created Subfolder Name
      type: string
      default: My Neat Subfolder
    

    Which will result in the following display for the user:

    Example string-type parameter.
    Example string-type parameter.

    Integer parameters🔗

    Integer-type parameters must be defined using a title and the int type. You can also set a default value for this parameter, which must be an integer, as well as make it a required parameter, which prevents users from leaving it blank.

    For example, an integer-type parameter can be defined like this:

    input_size:
      title: Input Size
      type: int
      default: 768
      required: True
    

    Which will result in the following display for the user:

    Example integer-type parameter.
    Example integer-type parameter.

    Float parameters🔗

    Float-type parameters must be defined using a title and the float type. You can also set a default value for this parameter, which must be a float, as well as make it a required parameter, which prevents users from leaving it blank.

    For example, a float-type parameter can be defined like this:

    wip:
      title: Word Insertion Penalty
      type: float
      required: True
    

    Which will result in the following display for the user:

    Example float-type parameter.
    Example float-type parameter.

    Boolean parameters🔗

    Boolean-type parameters must be defined using a title and the bool type. You can also set a default value for this parameter, which must be a boolean, as well as make it a required parameter, which prevents users from leaving it blank.

    In the configuration form, boolean parameters are displayed as toggles.

    For example, a boolean-type parameter can be defined like this:

    score:
      title: Run Worker in Evaluation Mode
      type: bool
      default: False
    

    Which will result in the following display for the user:

    Example boolean-type parameter.
    Example boolean-type parameter.

    Enum (choices) parameters🔗

    Enum-type parameters must be defined using a title, the enum type and at least two choices. You cannot define an enum-type parameter without choices. You can also set a default value for this parameter, which must be one of the available choices, as well as make it a required parameter, which prevents users from leaving it blank. Enum-type parameters should be used when you want to limit the users to a given set of options.

    In the configuration form, enum parameters are displayed as selects.

    For example, an enum-type parameter can be defined like this:

    parent_type:
      title: Target Parent Element Type
      type: enum
      default: paragraph
      choices:
        - paragraph
        - text_zone
        - page
    

    Which will result in the following display for the user:

    Example enum-type parameter.
    Example enum-type parameter.

    List parameters🔗

    List-type parameters must be defined using a title, the list type and a subtype for the elements inside the list. You can also set a default value for this parameter, which must be a list containing elements of the given subtype, as well as make it a required parameter, which prevents users from leaving it blank.

    The allowed subtypes are int, float and string.

    In the configuration form, list parameters are displayed as rows of input fields.

    For example, a list-type parameter can be defined like this:

    a_list:
      title: A List of Values
      type: list
      subtype: int
      default: [4, 3, 12] 
    

    Which will result in the following display for the user:

    Example list-type parameter.
    Example list-type parameter.

    Dictionary parameters🔗

    Dictionary-type parameters must be defined using a title and the dict type. You can also set a default value for this parameter, which must be a dictionary, as well as make it a required parameter, which prevents users from leaving it blank. You can use dictionary parameters for example to specify a correspondence between the classes that are predicted by a worker and the elements that are created on Arkindex from these predictions.

    Dictionary-type parameters only accept strings as values.

    In the configuration form, dictionary parameters are displayed as a table with one column for keys and one column for values.

    For example, a dictionary-type parameter can be defined like this:

    classes:
      title: Output Classes to Elements Correspondence
      type: dict
      default:
        a: page
        b: text_line
    

    Which will result in the following display for the user:

    Example dictionary-type parameter.
    Example dictionary-type parameter.

    Example user_configuration🔗

    user_configuration:
      vertical_padding:
        type: int
        default: 0
        title: Vertical Padding
      element_base_name:
        type: string
        required: true
        title: Element Base Name
      create_confidence_metadata:
        type: bool
        default: false
        title: Create confidence metadata on elements
      some_other_parameter:
        type: enum
        required: true
        default: 23
        choices:
          - 12
          - 23
          - 56
        title: Another Parameter
    

    Fallback to free JSON input🔗

    If you have defined user-configurable parameters using these specifications, Arkindex users can choose between using the form or the free JSON input field by toggling the JSON toggle. If there are unsupported parameter types in the defined user_configuration, the frontend will automatically fall back to the free JSON input field. The same is true if you have not defined user-configurable parameters using these specifications.

    Example configuration🔗

    ---
    version: 2
    workers:
      # Path to a single YAML file
      - path/to/worker.yml
      # Pattern matching any YAML file in the configuration folder
      # or in its sub-directories
      - configuration/**/*.yml
      # Configuration embedded directly into this file
      - name: Book of hours
        slug: book_of_hours
        type: classifier
        docker:
          build: project/Dockerfile
          image: hub.docker.com/project/image:tag
          command: python mysuperscript.py --blabla
          environment:
            TOKEN: deadBeefToken
        configuration:
          model: path/to/model
          anyKey: anyValue
          classes: [X, Y, Z]
        user_configuration:
          vertical_padding:
            type: int
            default: 0
            title: Vertical Padding
        secrets:
          - path/to/secret.json