Settings

    You will find on this page all the configuration settings available for the Arkindex backend. These settings must be stored in a YAML file, and exposed using a Docker volume to the backend and worker container. The configuration path is set through the CONFIG_PATH environment variable.

    Configuration sample🔗

    A minimal file is available here:

    ---
    # This file must be exposed to the backend and worker container using a Docker volume
    # You can set its path in the container by using the environment variable CONFIG_PATH
    
    # Connection to the PostgreSQL database
    # Here we use a PostgreSQL container on the same network
    database:
      host: ark-database
      port: 5432
      name: arkindex_public
      user: public_user
      password: public_data
    
    
    # Connection to the Redis server to share asynchronous local jobs
    redis:
      host: ark-redis
    
    # Connection to an S3-compatible storage API
    # Here we use a MinIO container on the same network
    s3:
      access_key_id: minio1234
      secret_access_key: minio1234
      endpoint: https://minio.ark.localhost
      region: local
    
    # Random characters used to salt cryptographic hashes
    secret_key: LkX7et2k5yh2muoCiTcpKCpZBXQ8fmJXXdSuR98lQn
    
    # Connection to the search engine
    # This is only needed if the search feature is enabled
    solr:
      api_url: http://ark-solr:8983/solr/
    
    # Cache system to use for performance
    # In production we recommend to use Redis
    cache:
      type: memory
    
    # Control the optional features on your instance
    features:
      signup: yes
      search: yes
    
    # Use remote frontend files, hosted by Teklia
    # You need to synchronize the version mentioned here
    # with the one from your backend
    static:
      frontend_version: 1.6.0
      cdn_assets_url: https://assets.teklia.com/arkindex
    
    # Configure the remote worker credentials
    # to allow them to communicate with this Arkindex instance
    ponos:
      private_key: /etc/ponos.key
      default_env:
        ARKINDEX_API_URL: https://ark.localhost/api/v1/
    
    # Root URL of the Arkindex instance
    # Used to build external links (in emails)
    public_hostname: https://ark.localhost
    
    # Configure the Django settings for session & CSRF cookies
    # along with CORS allowed hosts
    # These should match your public hostname
    session:
      cookie_domain: ark.localhost
    csrf:
      cookie_domain: ark.localhost
      trusted_origins:
        - 'https://*.ark.localhost'
    cors:
      origin_whitelist:
        - https://ark.localhost
    
    # HTTP hosts allowed to reach the server
    # This should match your public hostname
    # Note the leading .
    allowed_hosts:
      - .ark.localhost
    
    
    # IIIF Image Server used to expose the locally uploaded images
    # Do not change this setting if you use the bootstrap script
    local_imageserver_id: 12345
    
    # Worker version used by the file import tasks
    # Do not change this setting if you use the bootstrap script
    imports_worker_version: f2bb8dd7-55e9-49ae-9bd9-b1d2e5d491b9
    

    Reference🔗

    All the configuration options available in the YAML file are described here in alphabetical order.

    allowed_hosts🔗

    A list of hosts that are allowed to access the server.

    The following hostnames are always added to the list:

    • 127.0.0.1
    • localhost
    • backend
    • ark-backend

    See the Django documentation on ALLOWED_HOSTS.

    arkindex_env🔗

    Arkindex execution environment. Defaults to dev. Change this in production.

    When set to dev:

    • Django's DEBUG setting is set to True;
    • All errors will show a detailed error page with tracebacks, settings, etc., which is a security issue in production;
    • Default values for ARKINDEX_API_URL compatible with running the backend outside Docker are set on ponos.default_env;
    • Caching is disabled if it was not explicitly configured, instead of falling back to an in-memory cache.

    Error report e-mails, when enabled, are prefixed by [Arkindex $ARKINDEX_ENV] (where $ARKINDEX_ENV is replaced by the value of ARKINDEX_ENV).

    Since Arkindex 1.1.1

    The Markdown message that will be displayed in the frontend.

    The style used to display the message among info, success, warning, error. Defaults to info.

    cache🔗

    cache.path🔗

    Path to an existing directory in which requests can be cached. Required only with the filesystem cache and otherwise ignored.

    cache.type🔗

    Required if cache is set; defines the type of cache, implying the possible requirement of other properties on this configuration block. Possibles values are:

    • dummy: Debugging-only cache that does not actually cache anything
    • memory: In-memory caching
    • redis: Cache using a Redis instance (requires url)
    • memcached: Cache using a Memcached instance (requires url)
    • filesystem: Cache using the file system (requires path)

    When this is unset, this is set to memory. If arkindex_env is set to dev and this is unset, this is set to dummy.

    cache.url🔗

    Hostname and optional port number for a memcached or Redis instance.

    Required only with memcached and redis caches; ignored for other cache types.

    cleanup🔗

    Since Arkindex 1.6.1.

    This section defines configuration items related to the arkindex cleanup administrator command.

    cleanup.model_delay🔗

    Minimum days between the archival date of a Model and its deletion, if it does not have any associated ML results. Defaults to 30 days.

    cleanup.worker_delay🔗

    Minimum days between the archival date of a Worker and its deletion, if it does not have any associated ML results. Defaults to 30 days.

    cors🔗

    This section defines configuration items specific to Cross-Origin Resource Sharing on the REST API.

    To learn more about CORS, browse the Mozilla docs.

    cors.origin_whitelist🔗

    A list of CORS origins, as defined in RFC 6454: a URI scheme, hostname and port. It is possible to omit the port for https:// and http://; they will default to 443 and 80 respectively.

    This defaults to the following :

    • http://localhost:8080
    • http://127.0.0.1:8080

    Since Arkindex 0.14.0, URI schemes are required for all values of this parameter.

    cors.suffixes🔗

    A list of regular expressions that can match hostname suffixes. They will be prepended with ^https://.+.

    This may be used in conjunction with cors.origin_whitelist: when an origin does not match the former, the suffixes regular expressions will be tested instead.

    csrf🔗

    Configuration related to cross-site request forgery attacks protection.

    This sets the Domain option on the CSRF cookie. When unset, this disables the Domain option.

    See Django's documentation on CSRF_COOKIE_DOMAIN.

    Sets a name for the CSRF token cookie. This defaults to arkindex.csrf.

    Changing this name to a value other than the default may impact authentication on the frontend and API clients and prevent POST, PUT or DELETE requests without their proper reconfiguration, for example using ARKINDEX_API_CSRF_COOKIE.

    See Django's documentation on CSRF_COOKIE_NAME.

    The value of the SameSite flag on the CSRF cookie. This flag prevents the cookie from being sent in cross-site requests. Either lax, strict, none or false. false removes the flag. Defaults to Lax.

    See Django's documentation on CSRF_COOKIE_SAMESITE.

    Since Arkindex 0.14.0

    Boolean; when enabled, adds the Secure flag on the CSRF cookie. This flag prevents the cookie from being sent without HTTPS. Defaults to false.

    See Django's documentation on CSRF_COOKIE_SECURE

    csrf.trusted_origins🔗

    A list of hosts where unsafe requests (POST, PUT, DELETE) are allowed over HTTPS. This requires requests to include one of those hosts as the Referer.

    A subdomain catch-all is accepted, such as .teklia.com, to allow any subdomain of teklia.com.

    See Django's documentation on CSRF_TRUSTED_ORIGINS.

    database🔗

    database.host🔗

    Hostname of the PostgreSQL database. Defaults to localhost.

    database.name🔗

    Name of the PostgreSQL database. Defaults to arkindex_dev.

    database.password🔗

    Password for the PostgreSQL user. Defaults to devdata. Please change this in production.

    database.port🔗

    Port of the PostgreSQL database. Defaults to 9100.

    database.replica🔗

    Optional information for a read-only Postgresql replica, allowing to scale the database across multiple servers. This is needed to setup Patroni.

    If you specify these information, all write operations will happen on the main database, and all read operations will happen on the replica.

    Settings are the same as for the main database described above this section, but under a replica header:

    database:
      replica:
        host: ...
        port: ...
        user: ...
        password: ...
        name: ...
    

    database.user🔗

    Name of a PostgreSQL user to use. Defaults to devuser. Please change this in production.

    docker🔗

    docker.tasks_image🔗

    A Docker tag for the Arkindex tasks image. This defaults to registry.gitlab.teklia.com/arkindex/tasks. When the image is specified without a tag, :latest is assumed.

    This is used to specify which image to use to run file imports, S3 imports, or element listings when running processes.

    doorbell🔗

    Configures the Doorbell integration to send feedback from the frontend. When both settings are set, the doorbell feature flag is automatically enabled. This feature flag cannot be overridden manually.

    doorbell.appkey🔗

    The application key provided by Doorbell for authentication.

    doorbell.id🔗

    The application ID to use when submitting feedback.

    email🔗

    The e-mail configuration is used in three cases:

    When the whole configuration section is omitted, e-mail messages that would normally be sent will instead be printed to standard output.

    email.error_report_recipients🔗

    A list of e-mail addresses to send HTTP 500 error reports to. When unset, this will not send any error reports.

    email.host🔗

    Hostname of the SMTP server to use to send emails. The server must support TLS.

    email.password🔗

    Password for the SMTP server.

    email.port🔗

    Port of the SMTP server to use to send emails. Defaults to 25.

    email.user🔗

    Email address to use both as the SMTP username and as the sender address.

    export🔗

    export.ttl🔗

    This integer value configures the Time To Live of an SQLite export on a corpus in seconds. This prevents creating exports too frequently through the API and overloading the system. When an export has been successfully created, another export on the same corpus cannot be created within this specified time.

    Default to 21600 seconds (or 6 hours).

    features🔗

    This section configures whether or not some optional features of Arkindex are available or not, using feature flags. This is available since Arkindex 0.12.3.

    Unlike in other parts of the configuration, unknown keys are not allowed here, to prevent unknown configuration items from being potentially shared over the API.

    Since Arkindex 0.12.3. Defines whether or not the search feature is available. Boolean, defaults to false.

    When disabled, this disables all interactions with Solr, causes search APIs to return HTTP 400 errors, and causes the frontend to hide all search-related components. If you enable this feature with a non-empty database, you will also need to build the search index for each of the indexable projects to ensure they are ready to use.

    features.selection🔗

    Since Arkindex 0.13.1. Defines whether or not the element selection feature is available. Boolean, defaults to true.

    features.signup🔗

    Since Arkindex 0.12.3. Defines whether or not the sign-up feature is available. Boolean, defaults to true.

    When disabled, the Register button on the frontend is not shown when logged out and the Register API endpoint always returns HTTP 400 errors. The password reset and email verification features are still available, though the verification email will not be sent automatically anywhere.

    imports_worker_version🔗

    Worker version UUID that will be used for all file imports operation (when importing PDF, images, or files from an S3-compatible bucket).

    ingest🔗

    Since Arkindex 1.2.6.

    This section defines configuration items specific to features using Amazon S3 and other S3-compatible APIs for image ingestion.

    ingest.access_key_id🔗

    The Access Key ID for read/write access to S3 buckets.

    ingest.endpoint🔗

    An optional custom endpoint to use to access S3 buckets, to use another file storage with an S3-compatible API such as MinIO.

    ingest.extra_buckets🔗

    Optional array of strings. Adds additional bucket names returned by the ListBuckets API endpoint.

    This allows to make other buckets available to the S3 import form in the frontend, even though they might not be visible when listing all buckets due to missing S3 permissions.

    ingest.imageserver_id🔗

    ID of the ImageServer to use when importing images from S3. This is required for the S3 ingest feature to work, and should point to the IIIF server where the images on the S3 bucket are hosted at.

    ingest.prefix_by_bucket_name🔗

    Whether or not to prefix the bucket name to the image's path when building an IIIF image identifier. Boolean, defaults to True.

    For example, when ingesting an image on a bucket mybucket with the path a/b.jpg, enabling this option will cause the ingest process to use mybucket%2Fa%2Fb.jpg as the IIIF identifier.

    ingest.region🔗

    The AWS region the S3 buckets are located in. This has no effect when endpoint is set.

    ingest.secret_access_key🔗

    The Secret Access Key for read/write access to S3 buckets.

    job_timeouts🔗

    Since Arkindex 1.1.0. Defines the asynchronous task timeouts for each of the asynchronous task types that Arkindex uses.

    job_timeouts.corpus_delete🔗

    Since Arkindex 1.1.0. Timeout for the corpus deletion asychronous task, in seconds. Defaults to 7200.

    job_timeouts.element_trash🔗

    Since Arkindex 1.1.0. Timeout for the element list deletion asychronous task, in seconds. Defaults to 3600.

    job_timeouts.export_corpus🔗

    Since Arkindex 1.1.0. Timeout for the corpus export asychronous task, in seconds. Defaults to 7200.

    job_timeouts.initialize_activity🔗

    Since Arkindex 1.1.0. Timeout for the worker activity initialization asynchronous task, in seconds. Defaults to 3600.

    job_timeouts.move_element🔗

    Since Arkindex 1.1.0. Timeout for the element move asychronous task, in seconds. Defaults to 3600.

    job_timeouts.notify_process_completion🔗

    Timeout for the email notification task upon finished process, in seconds. Defaults to 120.

    job_timeouts.process_delete🔗

    Since Arkindex 1.1.0. Timeout for the process deletion asychronous task, in seconds. Defaults to 3600.

    job_timeouts.reindex_corpus🔗

    Timeout for the corpus search engine re-indexation task, in seconds. Defaults to 7200.

    job_timeouts.task🔗

    Since Arkindex 1.6.0. Timeout for locally executed ponos tasks (only for Community Edition). Defaults to 36000.

    job_timeouts.worker_results_delete🔗

    Since Arkindex 1.1.0. Timeout for the worker result deletion asychronous task, in seconds. Defaults to 3600.

    jwt_signing_key🔗

    This is used as the HMAC key for the Ponos agents' JSON Web Tokens authentication.

    When unset, this defaults to the value of secret_key. This should be set to a 32 characters-long or more random string, preferably different than the one used for secret_key.

    local_imageserver_id🔗

    ID of the IIIF image server linked to the Arkindex instance. Defaults to 1.

    The ImageServer may be created from the admin panel or a Django shell, or via the arkindex bootstrap command.

    metrics_port🔗

    Network port where a Prometheus /metrics endpoint will be exposed. This allows system administrator to integrate Arkindex in their monitoring stack.

    Default to 3000

    ponos🔗

    ponos.default_env🔗

    Default environment variables sent along with every Ponos task the Arkindex backend starts. For default tasks and Arkindex client auto-configuration, the following variables should be defined:

    • ARKINDEX_API_URL
    • ARKINDEX_API_CSRF_COOKIE

    Defining ARKINDEX_API_TOKEN is strongly discouraged as this can lead to a security vulnerability. Defining ARKINDEX_TASK_TOKEN will have no effect, as this variable is overridden automatically when running any process.

    Any other custom variables defined here will not be used or checked by the backend, and passed on to the tasks directly.

    Allows autoconfiguration of the Arkindex client by setting the CSRF cookie's name.

    The CSRF cookie name is automatically deduced from the csrf.cookie_name setting and this variable is always set, but it is possible to override this value explicitly in the configuration file.

    ponos.default_env.ARKINDEX_API_URL🔗

    Allows autoconfiguration of the Arkindex client by setting the base URL of the Arkindex API.

    This should be set to a public-facing URL for the API's root, such as https://myarkindex.com/api/v1/. This defaults to http://localhost:8000/api/v1/ when arkindex_env is set to dev.

    ponos.default_farm🔗

    Enterprise Edition only.

    Required. UUID of a farm to assign to all processes by default.

    ponos.private_key🔗

    Enterprise Edition only.

    Path to an elliptic curve private key file to use as the server private key for secure registration of Ponos agents. This defaults to $BASE_DIR/ponos.key, where $BASE_DIR is the directory of the arkindex package.

    public_hostname🔗

    Since Arkindex 1.0.3. Root URL of the Arkindex instance, including the scheme and hostname (http://example.com).

    redis🔗

    Configuration related to Redis for asynchronous tasks using Channels. This is unrelated to the optional Redis cache configuration.

    redis.db🔗

    Since Arkindex 0.14.4. The database to use on the Redis server. Defaults to 0.

    redis.host🔗

    Hostname of the Redis server. Defaults to localhost.

    redis.password🔗

    Since Arkindex 0.14.4. Optional password to use when connecting to the Redis server. Defaults to null (no password).

    redis.port🔗

    Since Arkindex 0.14.4. The port to use to connect to the Redis server. Defaults to 6379.

    redis.timeout🔗

    Since Arkindex 0.14.4. The default asynchronous task timeout to use. Defaults to 1800.

    Since Arkindex 1.1.0, this parameter only applies to tasks that are not defined in job_timeouts.

    robots_txt_disallow🔗

    List of relative paths to disallow in the generated /robots.txt. Any path relative to the root will be marked as Disallow: <path> so that it should not be scrapped by any Robot.

    Example to fordid scraping on the whole instance: robots_txt_disallow: ["/"]

    s3🔗

    This section defines configuration items specific to features using Amazon S3 and other S3-compatible APIs.

    s3.access_key_id🔗

    The Access Key ID for read/write access to S3 buckets.

    s3.endpoint🔗

    An optional custom endpoint to use to access S3 buckets, to use another file storage with an S3-compatible API such as MinIO.

    s3.export_bucket🔗

    Name of the S3 bucket to use to store SQLite exports. This defaults to export.

    s3.ponos_artifacts_bucket🔗

    Name of the S3 bucket to use for task artifacts. This defaults to ponos-artifacts.

    s3.ponos_logs_bucket🔗

    Name of the S3 bucket to use for task logs. This defaults to ponos-logs.

    s3.region🔗

    The AWS region the S3 buckets are located in. This has no effect when endpoint is set.

    s3.secret_access_key🔗

    The Secret Access Key for read/write access to S3 buckets.

    s3.staging_bucket🔗

    Name of the S3 bucket to use for DataFile uploads. This defaults to staging.

    Note that for the frontend to be able to upload files to the staging bucket, you will need to configure CORS support on the bucket. MinIO has it enabled by default, but AWS S3 does not. Learn more about CORS support on S3 buckets here.

    s3.thumbnails_bucket🔗

    Name of the S3 bucket to use for element thumbnails. This defaults to thumbnails.

    s3.training_bucket🔗

    Since Arkindex 1.2.3.

    Name of the S3 bucket to use for Machine Learning model training. This defaults to training.

    secret_key🔗

    A secret key with multiple uses:

    • Session management (to keep users logged-in)
    • E-mail verification and password reset tokens sent by e-mail
    • JSON Web Tokens authentication for Ponos agents, unless SIGNING_KEY is set.

    This must be set to a 32-character or more random string in production.

    See the Django documentation on SECRET_KEY.

    sentry🔗

    Configuration related to the Sentry integration.

    sentry.dsn🔗

    Set the Data Source Name for the Sentry project related to the backend. When unset, the backend's Sentry integration is disabled.

    sentry.frontend_dsn🔗

    Set the Data Source Name for the Sentry project related to the frontend. This will be passed to the frontend when using retrieving it from a CDN using static.cdn_assets_url. When unset, the frontend's Sentry integration will be disabled.

    session🔗

    This sets the Domain option on the session cookie. When unset, this disables the Domain option.

    See Django's documentation on SESSION_COOKIE_DOMAIN.

    Sets a name for the session cookie. This defaults to arkindex.auth.

    See Django's documentation on SESSION_COOKIE_NAME.

    The value of the SameSite flag on the session cookie. This flag prevents the cookie from being sent in cross-site requests. Either lax, strict, none or false. false removes the flag. Defaults to Lax.

    See Django's documentation on SESSION_COOKIE_SAMESITE.

    Since Arkindex 0.14.0

    Boolean; when enabled, adds the Secure flag on the session cookie. This flag prevents the cookie from being sent without HTTPS. Defaults to false.

    See Django's documentation on SESSION_COOKIE_SECURE

    signup_default_group🔗

    Enterprise Edition only.

    User group Identifier (as UUID) where any new user registering in the instance will be assigned.

    This is useful to provide default rights for new users, especially on Ponos farms.

    solr🔗

    Since Arkindex 1.0.1.

    solr.api_url🔗

    Base URL of the Solr API. Defaults to http://localhost:8983/solr/.

    static🔗

    static.cdn_assets_url🔗

    URL to the root of an Arkindex frontend assets directory. When this variable is set, the backend will serve an index page to load the frontend, and frontend assets will be looked for in a subdirectory of this URL corresponding to the version number defined in frontend_version.

    static.frontend_version🔗

    When cdn_assets_url is set, this version number will be used when looking up frontend assets. This defaults to the backend's version number.

    static.mirador_url🔗

    URL to a Mirador instance. An absolute URL to an IIIF manifest will be directly appended by the frontend to this URL; you may want to add a query string argument such as https://universalviewer.io/uv.html?manifest=. Note that Mirador may need to be modified to use XMLHttpRequest.withCredentials, as IIIF manifests on private corpora will require authentication.

    static.root_path🔗

    Absolute path for collected static files during deployment.

    See the Django documentation on STATIC_ROOT.

    static.universal_viewer_url🔗

    URL to a Universal Viewer instance. An absolute URL to an IIIF manifest will be directly appended by the frontend to this URL; you may want to add a query string argument such as https://universalviewer.io/uv.html?manifest=. Note that Universal Viewer may need to be modified to use XMLHttpRequest.withCredentials, as IIIF manifests on private corpora will require authentication.

    worker_activity_timeout🔗

    Since Arkindex 1.3.4. Timeout for worker activities, in seconds. This timeout is the time without update after which an existing worker activity can be set to started again, allowing another worker to try to process it again. Defaults to 3600.

    workers_max_chunks🔗

    Maximum number of chunks in a worker process, expressed as a positive integer. This setting allow administrator to control the number of parallel tasks in a single process to avoid creating really large processes on smaller instances.

    Default: 10.