YAML configuration

    This page is a reference for version 2 of the YAML configuration file for Git repositories handled by Arkindex. Version 1 is not supported.

    The configuration file is always named .arkindex.yml and should be found at the root of the repository.

    Required attributes🔗

    The following attributes are required in every .arkindex.yml file:

    • version: Version of the configuration file in use. An error will occur if the version number is not set to 2.
    • type: Type of the repository. Either iiif for a repository holding IIIF manifests and collections for importing, or worker for a repository holding Arkindex workers.

    IIIF Import repository attributes🔗

    When the type is set to iiif, the following attribute is mandatory:

    • manifests: List of Unix-style patterns matching paths to IIIF 2.x manifests and IIIF 2.x and 3.x collections. The * and ? characters and the [abc] ranges behave like for standard Unix patterns, and ** matches directories recursively.

    Example configuration🔗

    ---
    version: 2
    type: iiif
    manifests:
      - mysuperfolder/mysupermanifest.json
      - mysuperfolder/asubfolder/**/*.json
    

    This would match mysuperfolder/mysupermanifest.json starting at the root of the repository, then any JSON file in mysuperfolder/asubfolder or any of its sub-directories.

    Worker repository attributes🔗

    When the type is set to worker, the workers attribute is mandatory.

    The workers attribute is a list of the following:

    • Paths to a YAML file holding the configuration for a single worker
    • Unix-style patterns matching paths to YAML files holding the configuration for a single worker
    • The configuration of a single worker embedded directly into the file

    Single worker configuration🔗

    The following describes the attributes of a YAML file configuring one worker, or of the configuration embedded directly in the .arkindex.yml file.

    All attributes are optional unless explicitly specified.

    • name: Mandatory. Name of the worker, for display purposes.

    • slug: Mandatory. Slug of this worker. The slug must be unique across the repository and must only hold alphanumerical characters, underscores or dashes.

    • type: Mandatory. Type of the worker, for display purposes only. Some common values include:

      • classifier
      • recognizer
      • ner
      • dla
      • word-segmenter
      • paragraph-creator
    • docker: Regroups Docker-related configuration attributes:

      • build: Path towards a Dockerfile used to build this worker, relative to the root of the repository. Defaults to Dockerfile.
      • command: Custom command line to be used when launching the Docker container for this Worker. By default, the command specified in the Dockerfile will be used.
    • environment: Mapping of string keys and string values to define environment variables to be set when the Docker image runs.

    • configuration: Mapping holding any string keys and values that can be later accessed in the worker's Python code. Can be used to define settings on your own worker, such as a file's location.

    • secrets: List of required secret names for that specific worker.

    For more information, see Using secrets in workers.

    Example configuration🔗

    ---
    version: 2
    type: worker
    
    workers:
      # Path to a single YAML file
      - path/to/worker.yml
      # Pattern matching any YAML file in the configuration folder
      # or in its sub-directories
      - configuration/**/*.yml
      # Configuration embedded directly into this file
      - name: Book of hours
        slug: book_of_hours
        type: classifier
        docker:
          build: project/Dockerfile
          image: hub.docker.com/project/image:tag
          command: python mysuperscript.py --blabla
          environment:
            TOKEN: deadBeefToken
        configuration:
          model: path/to/model
          anyKey: anyValue
          classes: [X, Y, Z]
        secrets:
          - path/to/secret.json