Skip to content

Settings

You will find on this page all the configuration settings available for the Arkindex backend. These settings must be stored in a YAML file, and exposed using a Docker volume to the backend and worker container. The configuration path is set through the CONFIG_PATH environment variable.

Configuration sample

A minimal file is available here:

---
# This file must be exposed to the backend and worker container using a Docker volume
# You can set its path in the container by using the environment variable CONFIG_PATH

# Connection to the PostgreSQL database
# Here we use a PostgreSQL container on the same network
database:
  host: ark-database
  port: 5432
  name: arkindex_public
  user: public_user
  password: public_data


# Connection to the Redis server to share asynchronous local jobs
redis:
  host: ark-redis

# Connection to an S3-compatible storage API
# Here we use a MinIO container on the same network
s3:
  access_key_id: minio1234
  secret_access_key: minio1234
  endpoint: https://minio.ark.localhost
  region: local

# Random characters used to salt cryptographic hashes
secret_key: LkX7et2k5yh2muoCiTcpKCpZBXQ8fmJXXdSuR98lQn

# Connection to the search engine
# This is only needed if the search feature is enabled
solr:
  api_url: http://ark-solr:8983/solr/

# Cache system to use for performance
# In production we recommend to use Redis
cache:
  type: memory

# Control the optional features on your instance
features:
  signup: yes
  search: yes
  ingest: yes

# Use remote frontend files, hosted by Teklia
# You need to synchronize the version mentioned here
# with the one from your backend
static:
  frontend_version: 1.6.0
  cdn_assets_url: https://assets.teklia.com/arkindex

# Configure the remote worker credentials
# to allow them to communicate with this Arkindex instance
ponos:
  private_key: /etc/ponos.key
  default_env:
    ARKINDEX_API_URL: https://ark.localhost/api/v1/
  task_expiry: 30

# Root URL of the Arkindex instance
# Used to build external links (in emails)
public_hostname: https://ark.localhost

# Configure the Django settings for session & CSRF cookies
# along with CORS allowed hosts
# These should match your public hostname
session:
  cookie_domain: ark.localhost
csrf:
  cookie_domain: ark.localhost
  trusted_origins:

    - 'https://*.ark.localhost'
cors:
  origin_whitelist:
    - https://ark.localhost

# HTTP hosts allowed to reach the server
# This should match your public hostname
# Note the leading .
allowed_hosts:

  - .ark.localhost


# IIIF Image Server used to expose the locally uploaded images
# Do not change this setting if you use the bootstrap script
local_imageserver_id: 12345

Reference

All the configuration options available in the YAML file are described here in alphabetical order.

allowed_hosts

A list of hosts that are allowed to access the server.

The following hostnames are always added to the list:

  • 127.0.0.1
  • localhost
  • backend
  • ark-backend

See the Django documentation on ALLOWED_HOSTS.

arkindex_env

Arkindex execution environment. Defaults to dev. Change this in production.

When set to dev:

  • Django’s DEBUG setting is set to True;
  • All errors will show a detailed error page with tracebacks, settings, etc., which is a security issue in production;
  • Default values for ARKINDEX_API_URL compatible with running the backend outside Docker are set on ponos.default_env;
  • Caching is disabled if it was not explicitly configured, instead of falling back to an in-memory cache.

Error report e-mails, when enabled, are prefixed by [Arkindex $ARKINDEX_ENV] (where $ARKINDEX_ENV is replaced by the value of ARKINDEX_ENV).

Since Arkindex 1.1.1

banner.message

The Markdown message that will be displayed in the frontend.

banner.style

The style used to display the message among info, success, warning, error. Defaults to info.

cache

cache.path

Path to an existing directory in which requests can be cached. Required only with the filesystem cache and otherwise ignored.

cache.type

Required if cache is set; defines the type of cache, implying the possible requirement of other properties on this configuration block. Possibles values are:

  • dummy: Debugging-only cache that does not actually cache anything
  • memory: In-memory caching
  • redis: Cache using a Redis instance (requires url)
  • memcached: Cache using a Memcached instance (requires url)
  • filesystem: Cache using the file system (requires path)

When this is unset, this is set to memory. If arkindex_env is set to dev and this is unset, this is set to dummy.

cache.url

Hostname and optional port number for a memcached or Redis instance.

Required only with memcached and redis caches; ignored for other cache types.

cleanup

Since Arkindex 1.6.1.

This section defines configuration items related to the arkindex cleanup administrator command.

cleanup.model_delay

Minimum days between the archival date of a Model and its deletion, if it does not have any associated ML results. Defaults to 30 days.

cleanup.worker_delay

Minimum days between the archival date of a Worker and its deletion, if it does not have any associated ML results. Defaults to 30 days.

cors

This section defines configuration items specific to Cross-Origin Resource Sharing on the REST API.

To learn more about CORS, browse the Mozilla docs.

cors.origin_whitelist

A list of CORS origins, as defined in RFC 6454: a URI scheme, hostname and port. It is possible to omit the port for https:// and http://; they will default to 443 and 80 respectively.

This defaults to the following :

  • http://localhost:8080
  • http://127.0.0.1:8080

Since Arkindex 0.14.0, URI schemes are required for all values of this parameter.

cors.suffixes

A list of regular expressions that can match hostname suffixes. They will be prepended with ^https://.+.

This may be used in conjunction with cors.origin_whitelist: when an origin does not match the former, the suffixes regular expressions will be tested instead.

csrf

Configuration related to cross-site request forgery attacks protection.

csrf.cookie_domain

This sets the Domain option on the CSRF cookie. When unset, this disables the Domain option.

See Django’s documentation on CSRF_COOKIE_DOMAIN.

csrf.cookie_name

Sets a name for the CSRF token cookie. This defaults to arkindex.csrf.

Changing this name to a value other than the default may impact authentication on the frontend and API clients and prevent POST, PUT or DELETE requests without their proper reconfiguration, for example using ARKINDEX_API_CSRF_COOKIE.

See Django’s documentation on CSRF_COOKIE_NAME.

csrf.cookie_samesite

The value of the SameSite flag on the CSRF cookie. This flag prevents the cookie from being sent in cross-site requests. Either lax, strict, none or false. false removes the flag. Defaults to Lax.

See Django’s documentation on CSRF_COOKIE_SAMESITE.

csrf.cookie_secure

Since Arkindex 0.14.0

Boolean; when enabled, adds the Secure flag on the CSRF cookie. This flag prevents the cookie from being sent without HTTPS. Defaults to false.

See Django’s documentation on CSRF_COOKIE_SECURE

csrf.trusted_origins

A list of hosts where unsafe requests (POST, PUT, DELETE) are allowed over HTTPS. This requires requests to include one of those hosts as the Referer.

A subdomain catch-all is accepted, such as .teklia.com, to allow any subdomain of teklia.com.

See Django’s documentation on CSRF_TRUSTED_ORIGINS.

database

database.export

Enterprise Edition only.

Since Arkindex 1.6.0, you can specify a database dedicated to running project exports. This allows you to run large exports on this database instead of the main database: your exports do not risk failing because a write operation has changed the database while the export was still running.

If you specify an export database, then when exporting a project you will be able to choose between this database and the main database.

Settings are the same as for the main database described below, but under an export header:

database:
  export:
    host: ...
    port: ...
    user: ...
    password: ...
    name: ...

database.host

Hostname of the PostgreSQL database. Defaults to localhost.

database.name

Name of the PostgreSQL database. Defaults to arkindex_dev.

database.password

Password for the PostgreSQL user. Defaults to devdata. Please change this in production.

database.port

Port of the PostgreSQL database. Defaults to 9100.

database.replica

Optional information for a read-only PostgreSQL replica, allowing to scale the database across multiple servers. This is needed to setup Patroni.

If you specify these information, all write operations will happen on the main database, and all read operations will happen on the replica.

Settings are the same as for the main database described above this section, but under a replica header:

database:
  replica:
    host: ...
    port: ...
    user: ...
    password: ...
    name: ...

database.user

Name of a PostgreSQL user to use. Defaults to devuser. Please change this in production.

email

The e-mail configuration is used in three cases:

When the whole configuration section is omitted, e-mail messages that would normally be sent will instead be printed to standard output.

email.error_report_recipients

A list of e-mail addresses to send HTTP 500 error reports to. When unset, this will not send any error reports.

email.host

Hostname of the SMTP server to use to send emails. The server must support TLS.

email.password

Password for the SMTP server.

email.port

Port of the SMTP server to use to send emails. Defaults to 25.

email.user

Email address to use both as the SMTP username and as the sender address.

export

export.ttl

This integer value configures the Time To Live of an SQLite export on a corpus in seconds. This prevents creating exports too frequently through the API and overloading the system. When an export has been successfully created, another export on the same corpus cannot be created within this specified time.

Default to 21600 seconds (or 6 hours).

features

This section configures whether or not some optional features of Arkindex are available or not, using feature flags. This is available since Arkindex 0.12.3.

Unlike in other parts of the configuration, unknown keys are not allowed here, to prevent unknown configuration items from being potentially shared over the API.

features.ingest

Since Arkindex 1.6.3. Defines whether or not the S3 ingest feature is available. Boolean, defaults to false.

When disabled, the Import files from S3 button is not visible in the Import / Export menu when browsing projects and elements. S3 ingest processes cannot be retried, and the S3 ingest-related API endpoints always return HTTP 400 errors.

features.search

Since Arkindex 0.12.3. Defines whether or not the search feature is available. Boolean, defaults to false.

When disabled, this disables all interactions with Solr, causes search APIs to return HTTP 400 errors, and causes the frontend to hide all search-related components. If you enable this feature with a non-empty database, you will also need to build the search index for each of the indexable projects to ensure they are ready to use.

features.selection

Since Arkindex 0.13.1. Defines whether or not the element selection feature is available. Boolean, defaults to true.

features.signup

Since Arkindex 0.12.3. Defines whether or not the sign-up feature is available. Boolean, defaults to true.

When disabled, the Register button on the frontend is not shown when logged out and the Register API endpoint always returns HTTP 400 errors. The password reset and email verification features are still available, though the verification email will not be sent automatically anywhere.

iiif_user_agent

Since Arkindex 1.6.2. Specify the User-Agent header used when checking images. Defaults to Arkindex/{VERSION} (+https://teklia.com/) where VERSION is the current backend version as defined in the VERSION file.

ingest

Since Arkindex 1.2.6.

This section defines configuration items specific to features using Amazon S3 and other S3-compatible APIs for image ingestion.

ingest.access_key_id

The Access Key ID for read/write access to S3 buckets.

ingest.endpoint

An optional custom endpoint to use to access S3 buckets, to use another file storage with an S3-compatible API such as MinIO.

ingest.extra_buckets

Optional array of strings. Adds additional bucket names returned by the ListBuckets API endpoint.

This allows to make other buckets available to the S3 import form in the frontend, even though they might not be visible when listing all buckets due to missing S3 permissions.

ingest.imageserver_id

ID of the ImageServer to use when importing images from S3. This is required for the S3 ingest feature to work, and should point to the IIIF server where the images on the S3 bucket are hosted at.

ingest.prefix_by_bucket_name

Whether or not to prefix the bucket name to the image’s path when building an IIIF image identifier. Boolean, defaults to True.

For example, when ingesting an image on a bucket mybucket with the path a/b.jpg, enabling this option will cause the ingest process to use mybucket%2Fa%2Fb.jpg as the IIIF identifier.

ingest.region

The AWS region the S3 buckets are located in. This has no effect when endpoint is set.

ingest.secret_access_key

The Secret Access Key for read/write access to S3 buckets.

job_timeouts

Since Arkindex 1.1.0. Defines the asynchronous task timeouts for each of the asynchronous task types that Arkindex uses.

job_timeouts.corpus_delete

Since Arkindex 1.1.0. Timeout for the corpus deletion asynchronous task, in seconds. Defaults to 7200.

job_timeouts.create_process_failures

Since Arkindex 1.6.2. Timeout for creating a process from failed WorkerActivities, in seconds. Defaults to 3600.

Once the new process has been created, an email is sent to the user containing a URL to configure and run it.

job_timeouts.element_trash

Since Arkindex 1.1.0. Timeout for the element list deletion asynchronous task, in seconds. Defaults to 3600.

job_timeouts.export_corpus

Since Arkindex 1.1.0. Timeout for the corpus export asynchronous task, in seconds. Defaults to 7200.

job_timeouts.initialize_activity

Since Arkindex 1.1.0. Timeout for the worker activity initialization asynchronous task, in seconds. Defaults to 3600.

job_timeouts.move_element

Since Arkindex 1.1.0. Timeout for the element move asynchronous task, in seconds. Defaults to 3600.

job_timeouts.notify_process_completion

Timeout for the email notification task upon finished process, in seconds. Defaults to 120.

job_timeouts.process_delete

Since Arkindex 1.1.0. Timeout for the process deletion asynchronous task, in seconds. Defaults to 3600.

job_timeouts.reindex_corpus

Timeout for the corpus search engine re-indexation task, in seconds. Defaults to 7200.

job_timeouts.send_verification_email

Since Arkindex 1.6.2. Timeout for sending the verification email, in seconds. Defaults to 120.

job_timeouts.task

Since Arkindex 1.6.0. Timeout for locally executed ponos tasks (only for Community Edition). Defaults to 36000.

job_timeouts.worker_results_delete

Since Arkindex 1.1.0. Timeout for the worker result deletion asynchronous task, in seconds. Defaults to 3600.

jwt_signing_key

This is used as the HMAC key for the Ponos agents’ JSON Web Tokens authentication.

When unset, this defaults to the value of secret_key. This should be set to a 32 characters-long or more random string, preferably different than the one used for secret_key.

local_imageserver_id

ID of the IIIF image server linked to the Arkindex instance. Defaults to 1.

The ImageServer may be created from the admin panel or a Django shell, or via the arkindex bootstrap command.

metrics_port

Network port where a Prometheus /metrics endpoint will be exposed. This allows system administrator to integrate Arkindex in their monitoring stack.

Default to 3000

ponos

ponos.auto_remove_container

Since Arkindex 1.6.3. When enabled, tasks executed in Community Edition will have their containers automatically removed after they finish. Disabled by default.

This option has no effect on Ponos agents in Enterprise Edition.

ponos.default_env

Default environment variables sent along with every Ponos task the Arkindex backend starts. For default tasks and Arkindex client auto-configuration, the following variables should be defined:

  • ARKINDEX_API_URL
  • ARKINDEX_API_CSRF_COOKIE

Defining ARKINDEX_API_TOKEN is strongly discouraged as this can lead to a security vulnerability. Defining ARKINDEX_TASK_TOKEN will have no effect, as this variable is overridden automatically when running any process.

Any other custom variables defined here will not be used or checked by the backend, and passed on to the tasks directly.

Allows autoconfiguration of the Arkindex client by setting the CSRF cookie’s name.

The CSRF cookie name is automatically deduced from the csrf.cookie_name setting and this variable is always set, but it is possible to override this value explicitly in the configuration file.

ponos.default_env.ARKINDEX_API_URL

Allows autoconfiguration of the Arkindex client by setting the base URL of the Arkindex API.

This should be set to a public-facing URL for the API’s root, such as https://myarkindex.com/api/v1/. This defaults to http://localhost:8000/api/v1/ when arkindex_env is set to dev.

ponos.default_env.ARKINDEX_MAX_IMAGE_PIXELS

Since Arkindex 1.6.2.

Define a size limit in pixels, for processing images.

This value is intended to override Pillow’s Image.MAX_IMAGE_PIXELS, which helps to avoid potential decompression bombs.

Warning

Pillow will simply log a warning when the limit is reached, but will only throw an error and stop processing if the number of pixels is greater than twice the value.

Pillow’s setting override is implemented for file imports, but this variable can also be used by any worker that needs to support very large images.

If set to 0, all checks are ignored (allows an infinite size, image source must be trusted). \ If set to None (or unset), default values are used (e.g. 89478485 pixels with Pillow).

ponos.default_farm

Enterprise Edition only.

Required. UUID of a farm to assign to all processes by default.

ponos.maximum_task_ttl

Since Arkindex 1.6.6.

Default maximum time-to-live for all WorkerRuns and tasks, in seconds. Zero means infinite. Defaults to 3600 seconds (1 hour).

This is applied when no maximum TTL has been set on a specific project through the administration interface.

ponos.private_key

Enterprise Edition only.

Path to an elliptic curve private key file to use as the server private key for secure registration of Ponos agents. This defaults to $BASE_DIR/ponos.key, where $BASE_DIR is the directory of the arkindex package.

ponos.task_expiry

Since Arkindex 1.6.1. Delay in days before a task is marked as expired.

Once a task has expired, it will be deleted when the arkindex cleanup command runs.

public_hostname

Since Arkindex 1.0.3. Root URL of the Arkindex instance, including the scheme and hostname (http://example.com).

redis

Configuration related to Redis for asynchronous tasks using Channels. This is unrelated to the optional Redis cache configuration.

redis.db

Since Arkindex 0.14.4. The database to use on the Redis server. Defaults to 0.

redis.host

Hostname of the Redis server. Defaults to localhost.

redis.password

Since Arkindex 0.14.4. Optional password to use when connecting to the Redis server. Defaults to null (no password).

redis.port

Since Arkindex 0.14.4. The port to use to connect to the Redis server. Defaults to 6379.

redis.timeout

Since Arkindex 0.14.4. The default asynchronous task timeout to use. Defaults to 1800.

Since Arkindex 1.1.0, this parameter only applies to tasks that are not defined in job_timeouts.

robots_txt_disallow

List of relative paths to disallow in the generated /robots.txt. Any path relative to the root will be marked as Disallow: <path> so that it should not be scrapped by any Robot.

Example to fordid scraping on the whole instance: robots_txt_disallow: ["/"]

s3

This section defines configuration items specific to features using Amazon S3 and other S3-compatible APIs.

s3.access_key_id

The Access Key ID for read/write access to S3 buckets.

s3.endpoint

An optional custom endpoint to use to access S3 buckets, to use another file storage with an S3-compatible API such as MinIO.

s3.export_bucket

Name of the S3 bucket to use to store SQLite exports. This defaults to export.

s3.max_retries

How many times any S3 request should be retried when errors occur. When a request is retried is determined by Boto3’s standard retry mode. This defaults to 5. Set this to 0 to disable all retries.

s3.ponos_artifacts_bucket

Name of the S3 bucket to use for task artifacts. This defaults to ponos-artifacts.

s3.ponos_logs_bucket

Name of the S3 bucket to use for task logs. This defaults to ponos-logs.

s3.region

The AWS region the S3 buckets are located in. This has no effect when endpoint is set.

s3.secret_access_key

The Secret Access Key for read/write access to S3 buckets.

s3.staging_bucket

Name of the S3 bucket to use for DataFile uploads. This defaults to staging.

Note that for the frontend to be able to upload files to the staging bucket, you will need to configure CORS support on the bucket. MinIO has it enabled by default, but AWS S3 does not. Learn more about CORS support on S3 buckets here.

s3.thumbnails_bucket

Name of the S3 bucket to use for element thumbnails. This defaults to thumbnails.

s3.training_bucket

Since Arkindex 1.2.3.

Name of the S3 bucket to use for Machine Learning model training. This defaults to training.

secret_key

A secret key with multiple uses:

  • Session management (to keep users logged-in)
  • E-mail verification and password reset tokens sent by e-mail
  • JSON Web Tokens authentication for Ponos agents, unless SIGNING_KEY is set.

This must be set to a 32-character or more random string in production.

See the Django documentation on SECRET_KEY.

sentry

Configuration related to the Sentry integration.

sentry.dsn

Set the Data Source Name for the Sentry project related to the backend. When unset, the backend’s Sentry integration is disabled.

sentry.frontend_dsn

Set the Data Source Name for the Sentry project related to the frontend. This will be passed to the frontend when using retrieving it from a CDN using static.cdn_assets_url. When unset, the frontend’s Sentry integration will be disabled.

session

session.cookie_domain

This sets the Domain option on the session cookie. When unset, this disables the Domain option.

See Django’s documentation on SESSION_COOKIE_DOMAIN.

session.cookie_name

Sets a name for the session cookie. This defaults to arkindex.auth.

See Django’s documentation on SESSION_COOKIE_NAME.

session.cookie_samesite

The value of the SameSite flag on the session cookie. This flag prevents the cookie from being sent in cross-site requests. Either lax, strict, none or false. false removes the flag. Defaults to Lax.

See Django’s documentation on SESSION_COOKIE_SAMESITE.

session.cookie_secure

Since Arkindex 0.14.0

Boolean; when enabled, adds the Secure flag on the session cookie. This flag prevents the cookie from being sent without HTTPS. Defaults to false.

See Django’s documentation on SESSION_COOKIE_SECURE

signup_default_group

Enterprise Edition only.

User group Identifier (as UUID) where any new user registering in the instance will be assigned.

This is useful to provide default rights for new users, especially on Ponos farms.

solr

Since Arkindex 1.0.1.

solr.api_url

Base URL of the Solr API. Defaults to http://localhost:8983/solr/.

static

static.cdn_assets_url

URL to the root of an Arkindex frontend assets directory. When this variable is set, the backend will serve an index page to load the frontend, and frontend assets will be looked for in a subdirectory of this URL corresponding to the version number defined in frontend_version.

static.frontend_version

When cdn_assets_url is set, this version number will be used when looking up frontend assets. This defaults to the backend’s version number.

static.mirador_url

URL to a Mirador instance. An absolute URL to an IIIF manifest will be directly appended by the frontend to this URL; you may want to add a query string argument such as https://universalviewer.io/uv.html?manifest=. Note that Mirador may need to be modified to use XMLHttpRequest.withCredentials, as IIIF manifests on private corpora will require authentication.

static.root_path

Absolute path for collected static files during deployment.

See the Django documentation on STATIC_ROOT.

static.universal_viewer_url

URL to a Universal Viewer instance. An absolute URL to an IIIF manifest will be directly appended by the frontend to this URL; you may want to add a query string argument such as https://universalviewer.io/uv.html?manifest=. Note that Universal Viewer may need to be modified to use XMLHttpRequest.withCredentials, as IIIF manifests on private corpora will require authentication.

worker_activity_timeout

Since Arkindex 1.3.4. Timeout for worker activities, in seconds. This timeout is the time without update after which an existing worker activity can be set to started again, allowing another worker to try to process it again. Defaults to 3600.

workers_max_chunks

Maximum number of chunks in a worker process, expressed as a positive integer. This setting allow administrator to control the number of parallel tasks in a single process to avoid creating really large processes on smaller instances.

Default: 10.