Setup with Docker
In this documentation, we’ll provide information on how to setup Arkindex Community Edition through Docker, on a single server.
If you need more information for a cluster or cloud setup, or using Enterprise Edition, please contact us.
Requirements¶
- A bare metal server running Linux Ubuntu LTS (22.04) for the platform
- If you plan to run Machine Learning processes, you’ll need another server with a GPU
- Docker installed on that server
- A domain name for that server
Third-party services¶
You’ll need to setup multiple companion services that support Arkindex. All these services are open source and freely available.
Required services:
- a load balancer, Traefik that will control traffic from your users towards the different services.
- a message broker for asynchronous service: Redis
- a relational database for all data stored in Arkindex: PostgreSQL
- these PostgreSQL extensions are required: PostGIS,
hstore
,btree_gist
.
- these PostgreSQL extensions are required: PostGIS,
Optional services:
- a remote storage server S3-compatible, MinIO
- You can use AWS S3 or any other API-compatible provider instead
- a IIIF server for your images, cantaloupe
- a search engine to lookup your transcriptions: Apache Solr
You can find a detailed docker-compose file in the Arkindex backend repository.
Of course, you’ll need to tweak the file so that it matches your own settings and domain name:
- replace all
ark.localhost
references with your own domain name - remove the minio references if you are using Amazon S3 buckets (or any other compatible solution)
- remove solr references if you do not need search capabilities
Arkindex platform¶
You’ll need several docker images from Teklia to run the Arkindex platform:
- the backend image, tagged
registry.gitlab.teklia.com/arkindex/backend:X.Y.Z
, must be present on your application server, - the tasks image,
registry.gitlab.teklia.com/arkindex/tasks:X.Y.Z
, will be used to by the remote workers (file imports, thumbnails generation, …).
We do not mention explicit versions in this documentation: you should update the X.Y.Z
mentions to the latest Arkindex release available.
The backend image mentioned above will run in two containers on your application server:
- for the API, this is really the heart of Arkindex,
- for the local asynchronous tasks that can directly reach the database (SQLite export, element deletion, …)
Running through Docker Compose¶
Here is a sample docker-compose.yml
that you can use to run the Arkindex platform on top of the previously mentioned services:
---
include:
- docker-compose.services.yml
services:
backend:
container_name: ark-backend
image: registry.gitlab.teklia.com/arkindex/backend:X.Y.Z
depends_on:
- db
- redis
- lb
labels:
traefik.enable: true
traefik.http.routers.backend.rule: Host(`ark.localhost`) && (PathPrefix(`/api/`) || PathPrefix(`/api-docs/`) || PathPrefix(`/admin/`) || PathPrefix(`/rq/`) || PathPrefix(`/static/`))
traefik.http.routers.backend.tls: true
environment:
CONFIG_PATH: /arkindex.yml
volumes:
- ./config.yml:/arkindex.yml:ro
healthcheck:
# start_interval is not fully implemented in Docker, until then we will use a short interval all the time
# https://github.com/moby/moby/issues/45897
interval: 5s
worker:
container_name: ark-worker
image: registry.gitlab.teklia.com/arkindex/backend:X.Y.Z
command: arkindex rqworker-pool --num-workers 2 -v 1 default high tasks export
depends_on:
- db
- redis
- backend
environment:
CONFIG_PATH: /arkindex.yml
volumes:
- ./config.yml:/arkindex.yml:ro
# Required to host temporary ponos data
# and share common paths between host and containers
- /tmp:/tmp
# Required to run process tasks
- /var/run/docker.sock:/var/run/docker.sock
To make it work, you’ll need:
- the
docker-compose.services.yml
described in the previous section - a configuration file
config.yml
described in the next section
You can simply run all the services using docker-compose up
First run¶
On your first run you’ll need to setup the database:
docker exec ark-backend arkindex migrate
You may encounter issues about PostgreSQL extensions if you created a PostgreSQL non-superuser for Arkindex (which is a good security measure).
In this case, you’ll need to connect once as a PostgreSQL superuser (often as postgres
) and create manually these 3 extensions:
create extension postgis;
create extension hstore;
create extension btree_gist;
Once migrations are fully done, you can create your own administrator account:
docker exec -it ark-backend arkindex createsuperuser
You should then be able to connect to the administrative interface on your instance url /admin
page.
Finally, you can bootstrap a few needed objects in the database:
- Image server for local upload
- Worker versions for imports and inference processes
- Default Ponos farm
If you do not plan on uploading documents through the frontend interface, this is optional.
docker exec ark-backend arkindex bootstrap
docker exec ark-backend arkindex update_system_workers
File storage¶
With the default configuration we provide, your local files will be stored in a MinIO local instance available as https://minio.ark.localhost
You’ll be able to login on the management web interface using the default (non-secure) login/password: minio1234/minio1234
.
All the Arkindex required buckets are automatically created, provided you do not change their name in the configuration below. Otherwise you can create new buckets easily through the MinIO web interface.
Configuration¶
All the configuration options for the backend are detailed on this page.
A minimal configuration file is available here, with the following assumptions:
- domain name is
ark.localhost
- arkindex release is
X.Y.Z
- third-party services are available though a docker network (so you can use names like
ark-database
instead of IPs) - specific values (image server IDs, worker versions, …) are provided by running
arkindex bootstrap
---
# This file must be exposed to the backend and worker container
# using a read-only Docker volume
# You can set its path in the container by using the environment variable CONFIG_PATH
# Connection to the postgresql database
# Here we use a postgresql container on the same network
database:
host: ark-database
port: 5432
name: arkindex_public
user: public_user
password: public_data
# Connection to the redis server to share asynchronous local jobs
redis:
host: ark-redis
# Connection to an S3-compatible storage API
# Here we use a minio container on the same network
s3:
access_key_id: minio1234
secret_access_key: minio1234
endpoint: https://minio.ark.localhost
region: local
# Connection to the search engine
# This is only needed if the search feature is enabled
solr:
api_url: http://ark-solr:8983/solr/
# Cache system to use for performance
# In production we recommend to use redis
cache:
type: memory
# Control the optional features on your instance
features:
signup: yes
search: yes
# Use remote frontend files, hosted by Teklia
# You need to synchronize the version mentioned here
# with the one from your backend
static:
frontend_version: X.Y.Z
cdn_assets_url: https://assets.teklia.com/arkindex
# Configure the remote workers credentials
# to allow them to communicate with this Arkindex instance
ponos:
private_key: /etc/ponos.key
default_env:
ARKINDEX_API_URL: https://ark.localhost/api/v1/
# Do not change this setting if you use the bootstrap script
ARKINDEX_API_TOKEN: deadbeefTestToken
# Root URL of the Arkindex instance
# Used to build external links (in emails)
public_hostname: https://ark.localhost
# Configure the django settings for session & CSRF cookies
# along with CORS allowed hosts
# These should match your public hostname
session:
cookie_domain: ark.localhost
csrf:
cookie_domain: ark.localhost
trusted_origins:
- 'https://*.ark.localhost'
cors:
origin_whitelist:
- https://ark.localhost
# HTTP hosts allowed to reach the server
# This should match your public hostname
# Note the leading .
allowed_hosts:
- .ark.localhost
# IIIF Image Server used to expose the locally uploaded images
# Do not change this setting if you use the bootstrap script
local_imageserver_id: 12345
Frontend static assets¶
We recommend you to use our own CDN for the frontend files. Simply use the assets.teklia.com
as source for static files in the backend configuration.
Here is part of the relevant configuration for your backend:
# Use remote frontend files, hosted by Teklia
# You need to synchronize the version mentioned here
# with the one from your backend
static:
frontend_version: X.Y.Z
cdn_assets_url: https://assets.teklia.com/arkindex