Settings
You will find on this page all the configuration settings available for the Arkindex backend. These settings must be stored in a YAML file, and exposed using a Docker volume to the backend and worker container. The configuration path is set through the CONFIG_PATH
environment variable.
Configuration sample¶
A minimal file is available here:
---
# This file must be exposed to the backend and worker container using a Docker volume
# You can set its path in the container by using the environment variable CONFIG_PATH
# Connection to the PostgreSQL database
# Here we use a PostgreSQL container on the same network
database:
host: ark-database
port: 5432
name: arkindex_public
user: public_user
password: public_data
# Connection to the Redis server to share asynchronous local jobs
redis:
host: ark-redis
# Connection to an S3-compatible storage API
# Here we use a MinIO container on the same network
s3:
access_key_id: minio1234
secret_access_key: minio1234
endpoint: https://minio.ark.localhost
region: local
# Random characters used to salt cryptographic hashes
secret_key: LkX7et2k5yh2muoCiTcpKCpZBXQ8fmJXXdSuR98lQn
# Connection to the search engine
# This is only needed if the search feature is enabled
solr:
api_url: http://ark-solr:8983/solr/
# Cache system to use for performance
# In production we recommend to use Redis
cache:
type: memory
# Control the optional features on your instance
features:
signup: yes
search: yes
ingest: yes
# Use remote frontend files, hosted by Teklia
# You need to synchronize the version mentioned here
# with the one from your backend
static:
frontend_version: 1.6.0
cdn_assets_url: https://assets.teklia.com/arkindex
# Configure the remote worker credentials
# to allow them to communicate with this Arkindex instance
ponos:
private_key: /etc/ponos.key
default_env:
ARKINDEX_API_URL: https://ark.localhost/api/v1/
task_expiry: 30
# Root URL of the Arkindex instance
# Used to build external links (in emails)
public_hostname: https://ark.localhost
# Configure the Django settings for session & CSRF cookies
# along with CORS allowed hosts
# These should match your public hostname
session:
cookie_domain: ark.localhost
csrf:
cookie_domain: ark.localhost
trusted_origins:
- 'https://*.ark.localhost'
cors:
origin_whitelist:
- https://ark.localhost
# HTTP hosts allowed to reach the server
# This should match your public hostname
# Note the leading .
allowed_hosts:
- .ark.localhost
# IIIF Image Server used to expose the locally uploaded images
# Do not change this setting if you use the bootstrap script
local_imageserver_id: 12345
Reference¶
All the configuration options available in the YAML file are described here in alphabetical order.
allowed_hosts
¶
A list of hosts that are allowed to access the server.
The following hostnames are always added to the list:
127.0.0.1
localhost
backend
ark-backend
See the Django documentation on ALLOWED_HOSTS
.
arkindex_env
¶
Arkindex execution environment. Defaults to dev
. Change this in production.
When set to dev
:
- Django’s
DEBUG
setting is set toTrue
; - All errors will show a detailed error page with tracebacks, settings, etc., which is a security issue in production;
- Default values for
ARKINDEX_API_URL
compatible with running the backend outside Docker are set onponos.default_env
; - Caching is disabled if it was not explicitly configured, instead of falling back to an in-memory cache.
Error report e-mails, when enabled, are prefixed by [Arkindex $ARKINDEX_ENV]
(where $ARKINDEX_ENV
is replaced by the value of ARKINDEX_ENV
).
banner
¶
Since Arkindex 1.1.1
banner.message
¶
The Markdown message that will be displayed in the frontend.
banner.style
¶
The style used to display the message among info
, success
, warning
, error
. Defaults to info
.
cache
¶
cache.path
¶
Path to an existing directory in which requests can be cached. Required only with the filesystem
cache and otherwise ignored.
cache.type
¶
Required if cache
is set; defines the type of cache, implying the possible requirement of other properties on this configuration block. Possibles values are:
dummy
: Debugging-only cache that does not actually cache anythingmemory
: In-memory cachingredis
: Cache using a Redis instance (requiresurl
)memcached
: Cache using a Memcached instance (requiresurl
)filesystem
: Cache using the file system (requirespath
)
When this is unset, this is set to memory
. If arkindex_env
is set to dev
and this is unset, this is set to dummy
.
cache.url
¶
Hostname and optional port number for a memcached or Redis instance.
Required only with memcached
and redis
caches; ignored for other cache types.
cleanup
¶
Since Arkindex 1.6.1.
This section defines configuration items related to the arkindex cleanup
administrator command.
cleanup.model_delay
¶
Minimum days between the archival date of a Model and its deletion, if it does not have any associated ML results. Defaults to 30 days.
cleanup.worker_delay
¶
Minimum days between the archival date of a Worker and its deletion, if it does not have any associated ML results. Defaults to 30 days.
cors
¶
This section defines configuration items specific to Cross-Origin Resource Sharing on the REST API.
To learn more about CORS, browse the Mozilla docs.
cors.origin_whitelist
¶
A list of CORS origins, as defined in RFC 6454: a URI scheme, hostname and port. It is possible to omit the port for https://
and http://
; they will default to 443
and 80
respectively.
This defaults to the following :
http://localhost:8080
http://127.0.0.1:8080
Since Arkindex 0.14.0, URI schemes are required for all values of this parameter.
cors.suffixes
¶
A list of regular expressions that can match hostname suffixes. They will be prepended with ^https://.+
.
This may be used in conjunction with cors.origin_whitelist
: when an origin does not match the former, the suffixes
regular expressions will be tested instead.
csrf
¶
Configuration related to cross-site request forgery attacks protection.
csrf.cookie_domain
¶
This sets the Domain
option on the CSRF cookie. When unset, this disables the Domain
option.
See Django’s documentation on CSRF_COOKIE_DOMAIN
.
csrf.cookie_name
¶
Sets a name for the CSRF token cookie. This defaults to arkindex.csrf
.
Changing this name to a value other than the default may impact authentication on the frontend and API clients and prevent POST
, PUT
or DELETE
requests without their proper reconfiguration, for example using ARKINDEX_API_CSRF_COOKIE
.
See Django’s documentation on CSRF_COOKIE_NAME
.
csrf.cookie_samesite
¶
The value of the SameSite
flag on the CSRF cookie. This flag prevents the cookie from being sent in cross-site requests. Either lax
, strict
, none
or false
. false
removes the flag. Defaults to Lax
.
See Django’s documentation on CSRF_COOKIE_SAMESITE
.
csrf.cookie_secure
¶
Since Arkindex 0.14.0
Boolean; when enabled, adds the Secure
flag on the CSRF cookie. This flag prevents the cookie from being sent without HTTPS. Defaults to false
.
See Django’s documentation on CSRF_COOKIE_SECURE
csrf.trusted_origins
¶
A list of hosts where unsafe requests (POST
, PUT
, DELETE
) are allowed over HTTPS. This requires requests to include one of those hosts as the Referer
.
A subdomain catch-all is accepted, such as .teklia.com
, to allow any subdomain of teklia.com
.
See Django’s documentation on CSRF_TRUSTED_ORIGINS
.
database
¶
database.export
¶
Enterprise Edition only.
Since Arkindex 1.6.0, you can specify a database dedicated to running project exports. This allows you to run large exports on this database instead of the main database: your exports do not risk failing because a write operation has changed the database while the export was still running.
If you specify an export database, then when exporting a project you will be able to choose between this database and the main database.
Settings are the same as for the main database described below, but under an export
header:
database:
export:
host: ...
port: ...
user: ...
password: ...
name: ...
database.host
¶
Hostname of the PostgreSQL database. Defaults to localhost
.
database.name
¶
Name of the PostgreSQL database. Defaults to arkindex_dev
.
database.password
¶
Password for the PostgreSQL user. Defaults to devdata
. Please change this in production.
database.port
¶
Port of the PostgreSQL database. Defaults to 9100
.
database.replica
¶
Optional information for a read-only PostgreSQL replica, allowing to scale the database across multiple servers. This is needed to setup Patroni.
If you specify these information, all write operations will happen on the main database, and all read operations will happen on the replica.
Settings are the same as for the main database described above this section, but under a replica
header:
database:
replica:
host: ...
port: ...
user: ...
password: ...
name: ...
database.user
¶
Name of a PostgreSQL user to use. Defaults to devuser
. Please change this in production.
email
¶
The e-mail configuration is used in three cases:
- Sending error reports to administrators;
- E-mail address verification upon registration using the
Register
API endpoint; - E-mail password reset links using the
ResetPassword
API endpoint.
When the whole configuration section is omitted, e-mail messages that would normally be sent will instead be printed to standard output.
email.error_report_recipients
¶
A list of e-mail addresses to send HTTP 500 error reports to. When unset, this will not send any error reports.
email.host
¶
Hostname of the SMTP server to use to send emails. The server must support TLS.
email.password
¶
Password for the SMTP server.
email.port
¶
Port of the SMTP server to use to send emails. Defaults to 25
.
email.user
¶
Email address to use both as the SMTP username and as the sender address.
export
¶
export.ttl
¶
This integer value configures the Time To Live of an SQLite export on a corpus in seconds. This prevents creating exports too frequently through the API and overloading the system. When an export has been successfully created, another export on the same corpus cannot be created within this specified time.
Default to 21600 seconds (or 6 hours).
features
¶
This section configures whether or not some optional features of Arkindex are available or not, using feature flags. This is available since Arkindex 0.12.3.
Unlike in other parts of the configuration, unknown keys are not allowed here, to prevent unknown configuration items from being potentially shared over the API.
features.ingest
¶
Since Arkindex 1.6.3. Defines whether or not the S3 ingest feature is available. Boolean, defaults to false
.
When disabled, the Import files from S3 button is not visible in the Import / Export menu when browsing projects and elements. S3 ingest processes cannot be retried, and the S3 ingest-related API endpoints always return HTTP 400 errors.
features.search
¶
Since Arkindex 0.12.3. Defines whether or not the search feature is available. Boolean, defaults to false
.
When disabled, this disables all interactions with Solr, causes search APIs to return HTTP 400 errors, and causes the frontend to hide all search-related components. If you enable this feature with a non-empty database, you will also need to build the search index for each of the indexable projects to ensure they are ready to use.
features.selection
¶
Since Arkindex 0.13.1. Defines whether or not the element selection feature is available. Boolean, defaults to true
.
features.signup
¶
Since Arkindex 0.12.3. Defines whether or not the sign-up feature is available. Boolean, defaults to true
.
When disabled, the Register
button on the frontend is not shown when logged out and the Register
API endpoint always returns HTTP 400 errors. The password reset and email verification features are still available, though the verification email will not be sent automatically anywhere.
iiif_user_agent
¶
Since Arkindex 1.6.2. Specify the User-Agent
header used when checking images. Defaults to Arkindex/{VERSION} (+https://teklia.com/)
where VERSION
is the current backend version as defined in the VERSION file.
ingest
¶
Since Arkindex 1.2.6.
This section defines configuration items specific to features using Amazon S3 and other S3-compatible APIs for image ingestion.
ingest.access_key_id
¶
The Access Key ID for read/write access to S3 buckets.
ingest.endpoint
¶
An optional custom endpoint to use to access S3 buckets, to use another file storage with an S3-compatible API such as MinIO.
ingest.extra_buckets
¶
Optional array of strings. Adds additional bucket names returned by the ListBuckets API endpoint.
This allows to make other buckets available to the S3 import form in the frontend, even though they might not be visible when listing all buckets due to missing S3 permissions.
ingest.imageserver_id
¶
ID of the ImageServer to use when importing images from S3. This is required for the S3 ingest feature to work, and should point to the IIIF server where the images on the S3 bucket are hosted at.
ingest.prefix_by_bucket_name
¶
Whether or not to prefix the bucket name to the image’s path when building an IIIF image identifier. Boolean, defaults to True.
For example, when ingesting an image on a bucket mybucket
with the path a/b.jpg
, enabling this option will cause the ingest process to use mybucket%2Fa%2Fb.jpg
as the IIIF identifier.
ingest.region
¶
The AWS region the S3 buckets are located in. This has no effect when endpoint
is set.
ingest.secret_access_key
¶
The Secret Access Key for read/write access to S3 buckets.
job_timeouts
¶
Since Arkindex 1.1.0. Defines the asynchronous task timeouts for each of the asynchronous task types that Arkindex uses.
job_timeouts.corpus_delete
¶
Since Arkindex 1.1.0. Timeout for the corpus deletion asynchronous task, in seconds. Defaults to 7200
.
job_timeouts.create_process_failures
¶
Since Arkindex 1.6.2. Timeout for creating a process from failed WorkerActivities, in seconds. Defaults to 3600
.
Once the new process has been created, an email is sent to the user containing a URL to configure and run it.
job_timeouts.element_trash
¶
Since Arkindex 1.1.0. Timeout for the element list deletion asynchronous task, in seconds. Defaults to 3600
.
job_timeouts.export_corpus
¶
Since Arkindex 1.1.0. Timeout for the corpus export asynchronous task, in seconds. Defaults to 7200
.
job_timeouts.initialize_activity
¶
Since Arkindex 1.1.0. Timeout for the worker activity initialization asynchronous task, in seconds. Defaults to 3600
.
job_timeouts.move_element
¶
Since Arkindex 1.1.0. Timeout for the element move asynchronous task, in seconds. Defaults to 3600
.
job_timeouts.notify_process_completion
¶
Timeout for the email notification task upon finished process, in seconds. Defaults to 120
.
job_timeouts.process_delete
¶
Since Arkindex 1.1.0. Timeout for the process deletion asynchronous task, in seconds. Defaults to 3600
.
job_timeouts.reindex_corpus
¶
Timeout for the corpus search engine re-indexation task, in seconds. Defaults to 7200
.
job_timeouts.send_verification_email
¶
Since Arkindex 1.6.2. Timeout for sending the verification email, in seconds. Defaults to 120
.
job_timeouts.task
¶
Since Arkindex 1.6.0. Timeout for locally executed ponos tasks (only for Community Edition). Defaults to 36000
.
job_timeouts.worker_results_delete
¶
Since Arkindex 1.1.0. Timeout for the worker result deletion asynchronous task, in seconds. Defaults to 3600
.
jwt_signing_key
¶
This is used as the HMAC key for the Ponos agents’ JSON Web Tokens authentication.
When unset, this defaults to the value of secret_key
. This should be set to a 32 characters-long or more random string, preferably different than the one used for secret_key
.
local_imageserver_id
¶
ID of the IIIF image server linked to the Arkindex instance. Defaults to 1.
The ImageServer may be created from the admin panel or a Django shell, or via the arkindex bootstrap
command.
metrics_port
¶
Network port where a Prometheus /metrics
endpoint will be exposed. This allows system administrator to integrate Arkindex in their monitoring stack.
Default to 3000
ponos
¶
ponos.auto_remove_container
¶
Since Arkindex 1.6.3. When enabled, tasks executed in Community Edition will have their containers automatically removed after they finish. Disabled by default.
This option has no effect on Ponos agents in Enterprise Edition.
ponos.default_env
¶
Default environment variables sent along with every Ponos task the Arkindex backend starts. For default tasks and Arkindex client auto-configuration, the following variables should be defined:
ARKINDEX_API_URL
ARKINDEX_API_CSRF_COOKIE
Defining ARKINDEX_API_TOKEN
is strongly discouraged as this can lead to a security vulnerability. Defining ARKINDEX_TASK_TOKEN
will have no effect, as this variable is overridden automatically when running any process.
Any other custom variables defined here will not be used or checked by the backend, and passed on to the tasks directly.
ponos.default_env.ARKINDEX_API_CSRF_COOKIE
¶
Allows autoconfiguration of the Arkindex client by setting the CSRF cookie’s name.
The CSRF cookie name is automatically deduced from the csrf.cookie_name
setting and this variable is always set, but it is possible to override this value explicitly in the configuration file.
ponos.default_env.ARKINDEX_API_URL
¶
Allows autoconfiguration of the Arkindex client by setting the base URL of the Arkindex API.
This should be set to a public-facing URL for the API’s root, such as https://myarkindex.com/api/v1/
. This defaults to http://localhost:8000/api/v1/
when arkindex_env
is set to dev
.
ponos.default_env.ARKINDEX_MAX_IMAGE_PIXELS
¶
Since Arkindex 1.6.2.
Define a size limit in pixels, for processing images.
This value is intended to override Pillow’s Image.MAX_IMAGE_PIXELS, which helps to avoid potential decompression bombs.
Warning
Pillow will simply log a warning when the limit is reached, but will only throw an error and stop processing if the number of pixels is greater than twice the value.
Pillow’s setting override is implemented for file imports, but this variable can also be used by any worker that needs to support very large images.
If set to 0, all checks are ignored (allows an infinite size, image source must be trusted). \ If set to None (or unset), default values are used (e.g. 89478485 pixels with Pillow).
ponos.default_farm
¶
Enterprise Edition only.
Required. UUID of a farm to assign to all processes by default.
ponos.maximum_task_ttl
¶
Since Arkindex 1.6.6.
Default maximum time-to-live for all WorkerRuns and tasks, in seconds. Zero means infinite. Defaults to 3600 seconds (1 hour).
This is applied when no maximum TTL has been set on a specific project through the administration interface.
ponos.private_key
¶
Enterprise Edition only.
Path to an elliptic curve private key file to use as the server private key for secure registration of Ponos agents. This defaults to $BASE_DIR/ponos.key
, where $BASE_DIR
is the directory of the arkindex
package.
ponos.task_expiry
¶
Since Arkindex 1.6.1. Delay in days before a task is marked as expired.
Once a task has expired, it will be deleted when the arkindex cleanup
command runs.
public_hostname
¶
Since Arkindex 1.0.3. Root URL of the Arkindex instance, including the scheme and hostname (http://example.com
).
redis
¶
Configuration related to Redis for asynchronous tasks using Channels. This is unrelated to the optional Redis cache configuration.
redis.db
¶
Since Arkindex 0.14.4. The database to use on the Redis server. Defaults to 0
.
redis.host
¶
Hostname of the Redis server. Defaults to localhost
.
redis.password
¶
Since Arkindex 0.14.4. Optional password to use when connecting to the Redis server. Defaults to null
(no password).
redis.port
¶
Since Arkindex 0.14.4. The port to use to connect to the Redis server. Defaults to 6379
.
redis.timeout
¶
Since Arkindex 0.14.4. The default asynchronous task timeout to use. Defaults to 1800
.
Since Arkindex 1.1.0, this parameter only applies to tasks that are not defined in job_timeouts
.
robots_txt_disallow
¶
List of relative paths to disallow in the generated /robots.txt
. Any path relative to the root will be marked as Disallow: <path>
so that it should not be scrapped by any Robot.
Example to fordid scraping on the whole instance: robots_txt_disallow: ["/"]
s3
¶
This section defines configuration items specific to features using Amazon S3 and other S3-compatible APIs.
s3.access_key_id
¶
The Access Key ID for read/write access to S3 buckets.
s3.endpoint
¶
An optional custom endpoint to use to access S3 buckets, to use another file storage with an S3-compatible API such as MinIO.
s3.export_bucket
¶
Name of the S3 bucket to use to store SQLite exports. This defaults to export
.
s3.max_retries
¶
How many times any S3 request should be retried when errors occur. When a request is retried is determined by Boto3’s standard retry mode. This defaults to 5. Set this to 0 to disable all retries.
s3.ponos_artifacts_bucket
¶
Name of the S3 bucket to use for task artifacts. This defaults to ponos-artifacts
.
s3.ponos_logs_bucket
¶
Name of the S3 bucket to use for task logs. This defaults to ponos-logs
.
s3.region
¶
The AWS region the S3 buckets are located in. This has no effect when endpoint
is set.
s3.secret_access_key
¶
The Secret Access Key for read/write access to S3 buckets.
s3.staging_bucket
¶
Name of the S3 bucket to use for DataFile uploads. This defaults to staging
.
Note that for the frontend to be able to upload files to the staging bucket, you will need to configure CORS support on the bucket. MinIO has it enabled by default, but AWS S3 does not. Learn more about CORS support on S3 buckets here.
s3.thumbnails_bucket
¶
Name of the S3 bucket to use for element thumbnails. This defaults to thumbnails
.
s3.training_bucket
¶
Since Arkindex 1.2.3.
Name of the S3 bucket to use for Machine Learning model training. This defaults to training
.
secret_key
¶
A secret key with multiple uses:
- Session management (to keep users logged-in)
- E-mail verification and password reset tokens sent by e-mail
- JSON Web Tokens authentication for Ponos agents, unless
SIGNING_KEY
is set.
This must be set to a 32-character or more random string in production.
See the Django documentation on SECRET_KEY
.
sentry
¶
Configuration related to the Sentry integration.
sentry.dsn
¶
Set the Data Source Name for the Sentry project related to the backend. When unset, the backend’s Sentry integration is disabled.
sentry.frontend_dsn
¶
Set the Data Source Name for the Sentry project related to the frontend. This will be passed to the frontend when using retrieving it from a CDN using static.cdn_assets_url
. When unset, the frontend’s Sentry integration will be disabled.
session
¶
session.cookie_domain
¶
This sets the Domain
option on the session cookie. When unset, this disables the Domain
option.
See Django’s documentation on SESSION_COOKIE_DOMAIN
.
session.cookie_name
¶
Sets a name for the session cookie. This defaults to arkindex.auth
.
See Django’s documentation on SESSION_COOKIE_NAME
.
session.cookie_samesite
¶
The value of the SameSite
flag on the session cookie. This flag prevents the cookie from being sent in cross-site requests. Either lax
, strict
, none
or false
. false
removes the flag. Defaults to Lax
.
See Django’s documentation on SESSION_COOKIE_SAMESITE
.
session.cookie_secure
¶
Since Arkindex 0.14.0
Boolean; when enabled, adds the Secure
flag on the session cookie. This flag prevents the cookie from being sent without HTTPS. Defaults to false
.
See Django’s documentation on SESSION_COOKIE_SECURE
signup_default_group
¶
Enterprise Edition only.
User group Identifier (as UUID) where any new user registering in the instance will be assigned.
This is useful to provide default rights for new users, especially on Ponos farms.
solr
¶
Since Arkindex 1.0.1.
solr.api_url
¶
Base URL of the Solr API. Defaults to http://localhost:8983/solr/
.
static
¶
static.cdn_assets_url
¶
URL to the root of an Arkindex frontend assets directory. When this variable is set, the backend will serve an index page to load the frontend, and frontend assets will be looked for in a subdirectory of this URL corresponding to the version number defined in frontend_version
.
static.frontend_version
¶
When cdn_assets_url
is set, this version number will be used when looking up frontend assets. This defaults to the backend’s version number.
static.mirador_url
¶
URL to a Mirador instance. An absolute URL to an IIIF manifest will be directly appended by the frontend to this URL; you may want to add a query string argument such as https://universalviewer.io/uv.html?manifest=
. Note that Mirador may need to be modified to use XMLHttpRequest.withCredentials
, as IIIF manifests on private corpora will require authentication.
static.root_path
¶
Absolute path for collected static files during deployment.
See the Django documentation on STATIC_ROOT
.
static.universal_viewer_url
¶
URL to a Universal Viewer instance. An absolute URL to an IIIF manifest will be directly appended by the frontend to this URL; you may want to add a query string argument such as https://universalviewer.io/uv.html?manifest=
. Note that Universal Viewer may need to be modified to use XMLHttpRequest.withCredentials
, as IIIF manifests on private corpora will require authentication.
worker_activity_timeout
¶
Since Arkindex 1.3.4. Timeout for worker activities, in seconds. This timeout is the time without update after which an existing worker activity can be set to started
again, allowing another worker to try to process it again. Defaults to 3600
.
workers_max_chunks
¶
Maximum number of chunks in a worker process, expressed as a positive integer. This setting allow administrator to control the number of parallel tasks in a single process to avoid creating really large processes on smaller instances.
Default: 10.