Settings
You will find on this page all the configuration settings available for the Arkindex backend. These settings must be stored in a YAML file, and exposed using a Docker volume to the backend and worker container. The configuration path is set through the CONFIG_PATH
environment variable.
Configuration sample
A minimal file is available here:
---
# This file must be exposed to the backend and worker container using a Docker volume
# You can set its path in the container by using the environment variable CONFIG_PATH
# Connection to the PostgreSQL database
# Here we use a PostgreSQL container on the same network
database:
host: ark-database
port: 5432
name: arkindex_public
user: public_user
password: public_data
# Connection to the Redis server to share asynchronous local jobs
redis:
host: ark-redis
# Connection to an S3-compatible storage API
# Here we use a MinIO container on the same network
s3:
access_key_id: minio1234
secret_access_key: minio1234
endpoint: https://minio.ark.localhost
region: local
# Random characters used to salt cryptographic hashes
secret_key: LkX7et2k5yh2muoCiTcpKCpZBXQ8fmJXXdSuR98lQn
# Connection to the search engine
# This is only needed if the search feature is enabled
solr:
api_url: http://ark-solr:8983/solr/
# Cache system to use for performance
# In production we recommend to use Redis
cache:
type: memory
# Control the optional features on your instance
features:
signup: yes
search: yes
ingest: yes
# Use remote frontend files, hosted by Teklia
# You need to synchronize the version mentioned here
# with the one from your backend
static:
frontend_version: 1.6.0
cdn_assets_url: https://assets.teklia.com/arkindex
# Configure the remote worker credentials
# to allow them to communicate with this Arkindex instance
ponos:
private_key: /etc/ponos.key
default_env:
ARKINDEX_API_URL: https://ark.localhost/api/v1/
task_expiry: 30
# Root URL of the Arkindex instance
# Used to build external links (in emails)
public_hostname: https://ark.localhost
# Configure the Django settings for session & CSRF cookies
# along with CORS allowed hosts
# These should match your public hostname
session:
cookie_domain: ark.localhost
csrf:
cookie_domain: ark.localhost
trusted_origins:
- 'https://*.ark.localhost'
cors:
origin_whitelist:
- https://ark.localhost
# HTTP hosts allowed to reach the server
# This should match your public hostname
# Note the leading .
allowed_hosts:
- .ark.localhost
# IIIF Image Server used to expose the locally uploaded images
# Do not change this setting if you use the bootstrap script
local_imageserver_id: 12345
Reference
All the configuration options available in the YAML file are described here in alphabetical order.
allowed_hosts
A list of hosts that are allowed to access the server.
The following hostnames are always added to the list:
-
127.0.0.1
-
localhost
-
backend
-
ark-backend
See the Django documentation on ALLOWED_HOSTS
.
arkindex_env
Arkindex execution environment. Defaults to dev
. Change this in production.
When set to dev
:
-
Django’s
DEBUG
setting is set toTrue
; -
All errors will show a detailed error page with tracebacks, settings, etc., which is a security issue in production;
-
Default values for
ARKINDEX_API_URL
compatible with running the backend outside Docker are set onponos.default_env
; -
Caching is disabled if it was not explicitly configured, instead of falling back to an in-memory cache.
Error report e-mails, when enabled, are prefixed by [Arkindex $ARKINDEX_ENV]
(where $ARKINDEX_ENV
is replaced by the value of ARKINDEX_ENV
).
cache
cache.path
Path to an existing directory in which requests can be cached. Required only with the filesystem
cache and otherwise ignored.
cache.type
Required if cache
is set; defines the type of cache, implying the possible requirement of other properties on this configuration block. Possibles values are:
-
dummy
: Debugging-only cache that does not actually cache anything -
memory
: In-memory caching -
redis
: Cache using a Redis instance (requiresurl
) -
memcached
: Cache using a Memcached instance (requiresurl
) -
filesystem
: Cache using the file system (requirespath
)
When this is unset, this is set to memory
. If arkindex_env
is set to dev
and this is unset, this is set to dummy
.
cache.url
Hostname and optional port number for a memcached or Redis instance.
Required only with memcached
and redis
caches; ignored for other cache types.
cleanup
Since Arkindex 1.6.1.
This section defines configuration items related to the arkindex cleanup
administrator command.
cors
This section defines configuration items specific to Cross-Origin Resource Sharing on the REST API.
To learn more about CORS, browse the Mozilla docs.
cors.origin_whitelist
A list of CORS origins, as defined in RFC 6454: a URI scheme, hostname and port. It is possible to omit the port for https://
and http://
; they will default to 443
and 80
respectively.
This defaults to the following :
-
http://localhost:8080
-
http://127.0.0.1:8080
Since Arkindex 0.14.0, URI schemes are required for all values of this parameter.
csrf
Configuration related to cross-site request forgery attacks protection.
csrf.cookie_domain
This sets the Domain
option on the CSRF cookie. When unset, this disables the Domain
option.
See Django’s documentation on CSRF_COOKIE_DOMAIN
.
csrf.cookie_name
Sets a name for the CSRF token cookie. This defaults to arkindex.csrf
.
Changing this name to a value other than the default may impact authentication on the frontend and API clients and prevent POST
, PUT
or DELETE
requests without their proper reconfiguration, for example using ARKINDEX_API_CSRF_COOKIE
.
See Django’s documentation on CSRF_COOKIE_NAME
.
csrf.cookie_samesite
The value of the SameSite
flag on the CSRF cookie. This flag prevents the cookie from being sent in cross-site requests. Either lax
, strict
, none
or false
. false
removes the flag. Defaults to Lax
.
See Django’s documentation on CSRF_COOKIE_SAMESITE
.
csrf.cookie_secure
Since Arkindex 0.14.0
Boolean; when enabled, adds the Secure
flag on the CSRF cookie. This flag prevents the cookie from being sent without HTTPS. Defaults to false
.
See Django’s documentation on CSRF_COOKIE_SECURE
csrf.trusted_origins
A list of hosts where unsafe requests (POST
, PUT
, DELETE
) are allowed over HTTPS. This requires requests to include one of those hosts as the Referer
.
A subdomain catch-all is accepted, such as .teklia.com
, to allow any subdomain of teklia.com
.
See Django’s documentation on CSRF_TRUSTED_ORIGINS
.
database
database.export
Enterprise Edition only.
Since Arkindex 1.6.0, you can specify a database dedicated to running project exports. This allows you to run large exports on this database instead of the main database: your exports do not risk failing because a write operation has changed the database while the export was still running.
If you specify an export database, then when exporting a project you will be able to choose between this database and the main database.
Settings are the same as for the main database described below, but under an export
header:
database: export: host: ... port: ... user: ... password: ... name: ...
database.password
Password for the PostgreSQL user. Defaults to devdata
. Please change this in production.
database.replica
Optional information for a read-only PostgreSQL replica, allowing to scale the database across multiple servers. This is needed to setup Patroni.
If you specify these information, all write operations will happen on the main database, and all read operations will happen on the replica.
Settings are the same as for the main database described above this section, but under a replica
header:
database: replica: host: ... port: ... user: ... password: ... name: ...
email
The e-mail configuration is used in three cases:
-
Sending error reports to administrators;
-
E-mail address verification upon registration using the
Register
API endpoint; -
E-mail password reset links using the
ResetPassword
API endpoint.
When the whole configuration section is omitted, e-mail messages that would normally be sent will instead be printed to standard output.
email.error_report_recipients
A list of e-mail addresses to send HTTP 500 error reports to. When unset, this will not send any error reports.
email.from_address
Since Arkindex 1.7.1. E-mail address used in the From
header of all e-mails sent by Arkindex. Optional, defaults to email.user
.
email.password
Password for the SMTP server.
Since Arkindex 1.8.0, when this parameter is omitted, Arkindex will not attempt authentication to the server.
export
export.ttl
This integer value configures the Time To Live of an SQLite export on a corpus in seconds. This prevents creating exports too frequently through the API and overloading the system. When an export has been successfully created, another export on the same corpus cannot be created within this specified time.
Default to 21600 seconds (or 6 hours).
features
This section configures whether or not some optional features of Arkindex are available or not, using feature flags. This is available since Arkindex 0.12.3.
Unlike in other parts of the configuration, unknown keys are not allowed here, to prevent unknown configuration items from being potentially shared over the API.
features.ingest
Since Arkindex 1.6.3. Defines whether or not the S3 ingest feature is available. Boolean, defaults to false
.
When disabled, the Import files from S3 button is not visible in the Import / Export menu when browsing projects and elements. S3 ingest processes cannot be retried, and the S3 ingest-related API endpoints always return HTTP 400 errors.
features.search
Since Arkindex 0.12.3. Defines whether or not the search feature is available. Boolean, defaults to false
.
When disabled, this disables all interactions with Solr, causes search APIs to return HTTP 400 errors, and causes the frontend to hide all search-related components. If you enable this feature with a non-empty database, you will also need to build the search index for each of the indexable projects to ensure they are ready to use.
features.selection
Since Arkindex 0.13.1. Defines whether or not the element selection feature is available. Boolean, defaults to true
.
features.signup
Since Arkindex 0.12.3. Defines whether or not the sign-up feature is available. Boolean, defaults to true
.
When disabled, the Register
button on the frontend is not shown when logged out and the Register
API endpoint always returns HTTP 400 errors. The password reset and email verification features are still available, though the verification email will not be sent automatically anywhere.
iiif_user_agent
Since Arkindex 1.6.2. Specify the User-Agent
header used when checking images. Defaults to Arkindex/{VERSION} (+https://teklia.com/)
where VERSION
is the current backend version as defined in the VERSION file.
ingest
Since Arkindex 1.2.6.
This section defines configuration items specific to features using Amazon S3 and other S3-compatible APIs for image ingestion.
ingest.endpoint
An optional custom endpoint to use to access S3 buckets, to use another file storage with an S3-compatible API such as MinIO.
ingest.extra_buckets
Optional array of strings. Adds additional bucket names returned by the ListBuckets API endpoint.
This allows to make other buckets available to the S3 import form in the frontend, even though they might not be visible when listing all buckets due to missing S3 permissions.
ingest.imageserver_id
ID of the ImageServer to use when importing images from S3. This is required for the S3 ingest feature to work, and should point to the IIIF server where the images on the S3 bucket are hosted at.
ingest.prefix_by_bucket_name
Whether or not to prefix the bucket name to the image’s path when building an IIIF image identifier. Boolean, defaults to True.
For example, when ingesting an image on a bucket mybucket
with the path a/b.jpg
, enabling this option will cause the ingest process to use mybucket%2Fa%2Fb.jpg
as the IIIF identifier.
job_timeouts
Since Arkindex 1.1.0. Defines the asynchronous task timeouts for each of the asynchronous task types that Arkindex uses.
job_timeouts.corpus_delete
Since Arkindex 1.1.0. Timeout for the corpus deletion asynchronous task, in seconds. Defaults to 7200
.
job_timeouts.create_process_failures
Since Arkindex 1.6.2. Timeout for creating a process from failed WorkerActivities, in seconds. Defaults to 3600
.
Once the new process has been created, an email is sent to the user containing a URL to configure and run it.
job_timeouts.element_trash
Since Arkindex 1.1.0. Timeout for the element list deletion asynchronous task, in seconds. Defaults to 3600
.
job_timeouts.export_corpus
Since Arkindex 1.1.0. Timeout for the corpus export asynchronous task, in seconds. Defaults to 7200
.
job_timeouts.initialize_activity
Since Arkindex 1.1.0. Timeout for the worker activity initialization asynchronous task, in seconds. Defaults to 3600
.
job_timeouts.move_element
Since Arkindex 1.1.0. Timeout for the element move asynchronous task, in seconds. Defaults to 3600
.
job_timeouts.notify_process_completion
Timeout for the email notification task upon finished process, in seconds. Defaults to 120
.
job_timeouts.process_delete
Since Arkindex 1.1.0. Timeout for the process deletion asynchronous task, in seconds. Defaults to 3600
.
job_timeouts.reindex_corpus
Timeout for the corpus search engine re-indexation task, in seconds. Defaults to 7200
.
job_timeouts.send_verification_email
Since Arkindex 1.6.2. Timeout for sending the verification email, in seconds. Defaults to 120
.
local_imageserver_id
ID of the IIIF image server linked to the Arkindex instance. Defaults to 1.
The ImageServer may be created from the admin panel or a Django shell, or via the arkindex bootstrap
command.
metrics_port
Network port where a Prometheus /metrics
endpoint will be exposed. This allows system administrator to integrate Arkindex in their monitoring stack.
Default to 3000
ponos
ponos.auto_remove_container
Since Arkindex 1.6.3. When enabled, tasks executed in Community Edition will have their containers automatically removed after they finish. Disabled by default.
This option has no effect on Ponos agents in Enterprise Edition.
ponos.default_env
Default environment variables sent along with every Ponos task the Arkindex backend starts. For default tasks and Arkindex client auto-configuration, the following variables should be defined:
-
ARKINDEX_API_URL
-
ARKINDEX_API_CSRF_COOKIE
Defining ARKINDEX_API_TOKEN
is strongly discouraged as this can lead to a security vulnerability. Defining ARKINDEX_TASK_TOKEN
will have no effect, as this variable is overridden automatically when running any process.
Any other custom variables defined here will not be used or checked by the backend, and passed on to the tasks directly.
ponos.default_env.ARKINDEX_API_CSRF_COOKIE
Allows autoconfiguration of the Arkindex client by setting the CSRF cookie’s name.
The CSRF cookie name is automatically deduced from the csrf.cookie_name
setting and this variable is always set, but it is possible to override this value explicitly in the configuration file.
ponos.default_env.ARKINDEX_API_URL
Allows autoconfiguration of the Arkindex client by setting the base URL of the Arkindex API.
This should be set to a public-facing URL for the API’s root, such as https://myarkindex.com/api/v1/
. This defaults to http://localhost:8000/api/v1/
when arkindex_env
is set to dev
.
ponos.default_env.ARKINDEX_MAX_IMAGE_PIXELS
Since Arkindex 1.6.2.
Define a size limit in pixels, for processing images.
This value is intended to override Pillow’s Image.MAX_IMAGE_PIXELS, which helps to avoid potential decompression bombs.
Pillow will simply log a warning when the limit is reached, but will only throw an error and stop processing if the number of pixels is greater than twice the value. |
Pillow’s setting override is implemented for file imports, but this variable can also be used by any worker that needs to support very large images.
If set to 0, all checks are ignored (allows an infinite size, image source must be trusted).
If set to None (or unset), default values are used (e.g. 89478485 pixels with Pillow).
ponos.default_farm
Enterprise Edition only.
Required. UUID of a farm to assign to all processes by default.
ponos.maximum_task_ttl
Since Arkindex 1.6.6.
Default maximum time-to-live for all WorkerRuns and tasks, in seconds. Zero means infinite. Defaults to 3600 seconds (1 hour).
This is applied when no maximum TTL has been set on a specific project through the administration interface.
process_enforce_budgets
Enterprise Edition only.
Since Arkindex 1.8.0. Boolean, defaults to false
. When this is enabled, starting or retrying a process, or restarting a task, may return HTTP 402 Payment Required when:
-
The process or task uses a worker that has costs associated to it, thus running that worker would incur a cost in a budget.
-
One of the following conditions are true:
-
The corpus of the process does not have a budget;
-
The creator of the process, or the user requesting to restart a task, does not have the necessary access rights on the budget;
-
There are no funds left on the budget.
-
This option does not affect the creation of budget entries by the arkindex update_budgets
command.
public_hostname
Since Arkindex 1.0.3. Root URL of the Arkindex instance, including the scheme and hostname (http://example.com
).
redis
Configuration related to Redis for asynchronous tasks using Channels. This is unrelated to the optional Redis cache configuration.
robots_txt_disallow
List of relative paths to disallow in the generated /robots.txt
. Any path relative to the root will be marked as Disallow: <path>
so that it should not be scrapped by any Robot.
Example to fordid scraping on the whole instance: robots_txt_disallow: ["/"]
s3
This section defines configuration items specific to features using Amazon S3 and other S3-compatible APIs.
s3.endpoint
An optional custom endpoint to use to access S3 buckets, to use another file storage with an S3-compatible API such as MinIO.
s3.max_retries
How many times any S3 request should be retried when errors occur. When a request is retried is determined by Boto3’s standard retry mode. This defaults to 5. Set this to 0 to disable all retries.
s3.ponos_artifacts_bucket
Name of the S3 bucket to use for task artifacts. This defaults to ponos-artifacts
.
s3.staging_bucket
Name of the S3 bucket to use for DataFile uploads. This defaults to staging
.
Note that for the frontend to be able to upload files to the staging bucket, you will need to configure CORS support on the bucket. MinIO has it enabled by default, but AWS S3 does not. Learn more about CORS support on S3 buckets here.
secret_key
A secret key with multiple uses:
-
Session management (to keep users logged-in)
-
E-mail verification and password reset tokens sent by e-mail
-
JSON Web Tokens authentication for Ponos agents, unless
SIGNING_KEY
is set.
This must be set to a 32-character or more random string in production.
See the Django documentation on SECRET_KEY
.
sentry
Configuration related to the Sentry integration.
session
session.cookie_domain
This sets the Domain
option on the session cookie. When unset, this disables the Domain
option.
See Django’s documentation on SESSION_COOKIE_DOMAIN
.
session.cookie_name
Sets a name for the session cookie. This defaults to arkindex.auth
.
See Django’s documentation on SESSION_COOKIE_NAME
.
session.cookie_samesite
The value of the SameSite
flag on the session cookie. This flag prevents the cookie from being sent in cross-site requests. Either lax
, strict
, none
or false
. false
removes the flag. Defaults to Lax
.
See Django’s documentation on SESSION_COOKIE_SAMESITE
.
session.cookie_secure
Since Arkindex 0.14.0
Boolean; when enabled, adds the Secure
flag on the session cookie. This flag prevents the cookie from being sent without HTTPS. Defaults to false
.
See Django’s documentation on SESSION_COOKIE_SECURE
signup_default_group
Enterprise Edition only.
User group Identifier (as UUID) where any new user registering in the instance will be assigned.
This is useful to provide default rights for new users, especially on Ponos farms.
static
static.cdn_assets_url
URL to the root of an Arkindex frontend assets directory. When this variable is set, the backend will serve an index page to load the frontend, and frontend assets will be looked for in a subdirectory of this URL corresponding to the version number defined in frontend_version
.
static.frontend_version
When cdn_assets_url
is set, this version number will be used when looking up frontend assets. This defaults to the backend’s version number.
static.mirador_url
URL to a Mirador instance. An absolute URL to an IIIF manifest will be directly appended by the frontend to this URL; you may want to add a query string argument such as https://universalviewer.io/uv.html?manifest=
. Note that Mirador may need to be modified to use XMLHttpRequest.withCredentials
, as IIIF manifests on private corpora will require authentication.
static.root_path
Absolute path for collected static files during deployment.
See the Django documentation on STATIC_ROOT
.
static.universal_viewer_url
URL to a Universal Viewer instance. An absolute URL to an IIIF manifest will be directly appended by the frontend to this URL; you may want to add a query string argument such as https://universalviewer.io/uv.html?manifest=
. Note that Universal Viewer may need to be modified to use XMLHttpRequest.withCredentials
, as IIIF manifests on private corpora will require authentication.