YAML configuration

    This page is a reference for version 2 of the YAML configuration file for Git repositories handled by Arkindex. Version 1 is not supported.

    The configuration file is always named .arkindex.yml and should be found at the root of the repository.

    Required attributes🔗

    The following attributes are required in every .arkindex.yml file:

    • version: Version of the configuration file in use. An error will occur if the version number is not set to 2.
    • type: Type of the repository. Either iiif for a repository holding IIIF manifests and collections for importing, or worker for a repository holding Arkindex workers.

    IIIF Import repository attributes🔗

    When the type is set to iiif, the following attribute is mandatory:

    • manifests: List of Unix-style patterns matching paths to IIIF 2.x manifests and IIIF 2.x and 3.x collections. The * and ? characters and the [abc] ranges behave like for standard Unix patterns, and ** matches directories recursively.

    Example configuration🔗

    ---
    version: 2
    type: iiif
    manifests:
      - mysuperfolder/mysupermanifest.json
      - mysuperfolder/asubfolder/**/*.json
    

    This would match mysuperfolder/mysupermanifest.json starting at the root of the repository, then any JSON file in mysuperfolder/asubfolder or any of its sub-directories.

    Worker repository attributes🔗

    When the type is set to worker, the workers attribute is mandatory.

    The workers attribute is a list of the following:

    • Paths to a YAML file holding the configuration for a single worker
    • Unix-style patterns matching paths to YAML files holding the configuration for a single worker
    • The configuration of a single worker embedded directly into the file

    Single worker configuration🔗

    The following describes the attributes of a YAML file configuring one worker, or of the configuration embedded directly in the .arkindex.yml file.

    All attributes are optional unless explicitly specified.

    • name: Mandatory. Name of the worker, for display purposes.

    • slug: Mandatory. Slug of this worker. The slug must be unique across the repository and must only hold alphanumerical characters, underscores or dashes.

    • type: Mandatory. Type of the worker, for display purposes only. Some common values include:

      • classifier
      • recognizer
      • ner
      • dla
      • word-segmenter
      • paragraph-creator
    • docker: Regroups Docker-related configuration attributes:

      • build: Path towards a Dockerfile used to build this worker, relative to the root of the repository. Defaults to Dockerfile.
      • command: Custom command line to be used when launching the Docker container for this Worker. By default, the command specified in the Dockerfile will be used.
    • environment: Mapping of string keys and string values to define environment variables to be set when the Docker image runs.

    • configuration: Mapping holding any string keys and values that can be later accessed in the worker's Python code. Can be used to define settings on your own worker, such as a file's location.

    • user_configuration: Mapping defining settings on your worker that can be modified by users. See below for details.

    • secrets: List of required secret names for that specific worker.

    For more information, see Using secrets in workers.

    Setting up user-configurable parameters🔗

    The YAML file can define parameters that users will be able to change when they use this worker in a process on Arkindex. These parameters are listed in a user_configuration attribute.

    A parameter is defined using the following settings:

    • title: mandatory. The parameter's title.
    • type: mandatory. A value type. The supported types are:
      • int
      • bool
      • float
      • string
      • enum
    • default: optional. A default value for the parameter. Must be of the defined parameter type.
    • required: optional. A boolean, defaults to false.
    • choices: optional. A list of options for enum type parameters.

    This definition allows for both validation of the input and the display of a form to make configuring workers easy for Arkindex users.

    User configuration form on Arkindex
    User configuration form on Arkindex

    Enum type parameters🔗

    The enum parameter type must be used to define parameters with a closed list of possible values. Those options are contained in the choices setting. If a default value is set, it must be one of the available choices.

    an_enum_parameter:
      title: Some Enum Parameter
      type: enum
      choices: 
        - value_1
        - value_2
        - value_3
      default: value_1
    

    Example user_configuration🔗

    user_configuration:
      vertical_padding:
        type: int
        default: 0
        title: Vertical Padding
      element_base_name:
        type: string
        required: true
        title: Element Base Name
      create_confidence_metadata:
        type: bool
        default: false
        title: Create confidence metadata on elements
      some_other_parameter:
        type: enum
        required: true
        default: 23
        choices:
          - 12
          - 23
          - 56
        title: Another Parameter
    

    Fallback to free JSON input🔗

    If you have defined user-configurable parameters using these specifications, Arkindex users can choose between using the form or the free JSON input field by toggling the JSON toggle. If there are unsupported parameter types in the defined user_configuration, the frontend will automatically fall back to the free JSON input field. The same is true if you have not defined user-configurable parameters using these specifications.

    Example configuration🔗

    ---
    version: 2
    type: worker
    
    workers:
      # Path to a single YAML file
      - path/to/worker.yml
      # Pattern matching any YAML file in the configuration folder
      # or in its sub-directories
      - configuration/**/*.yml
      # Configuration embedded directly into this file
      - name: Book of hours
        slug: book_of_hours
        type: classifier
        docker:
          build: project/Dockerfile
          image: hub.docker.com/project/image:tag
          command: python mysuperscript.py --blabla
          environment:
            TOKEN: deadBeefToken
        configuration:
          model: path/to/model
          anyKey: anyValue
          classes: [X, Y, Z]
        user_configuration:
          vertical_padding:
            type: int
            default: 0
            title: Vertical Padding
        secrets:
          - path/to/secret.json