Skip to content

Setup with Docker

In this documentation, we’ll provide information on how to setup Arkindex Community Edition through Docker, on a single server.

If you need more information for a cluster or cloud setup, or using Enterprise Edition, please contact us.

Requirements

  • A bare metal server running Linux Ubuntu LTS (22.04) for the platform
    • If you plan to run Machine Learning processes, you’ll need another server with a GPU
  • Docker installed on that server
  • A domain name for that server

Third-party services

You’ll need to setup multiple companion services that support Arkindex. All these services are open source and freely available.

Required services:

  • a load balancer, Traefik that will control traffic from your users towards the different services.
  • a message broker for asynchronous service: Redis
  • a relational database for all data stored in Arkindex: PostgreSQL
    • these PostgreSQL extensions are required: PostGIS, hstore, btree_gist.

Optional services:

  • a remote storage server S3-compatible, MinIO
    • You can use AWS S3 or any other API-compatible provider instead
  • a IIIF server for your images, cantaloupe
  • a search engine to lookup your transcriptions: Apache Solr

You can find a detailed docker-compose file in the Arkindex backend repository.

Of course, you’ll need to tweak the file so that it matches your own settings and domain name:

  • replace all ark.localhost references with your own domain name
  • remove the minio references if you are using Amazon S3 buckets (or any other compatible solution)
  • remove solr references if you do not need search capabilities

Arkindex platform

You’ll need several docker images from Teklia to run the Arkindex platform:

  • the backend image, tagged registry.gitlab.teklia.com/arkindex/backend:X.Y.Z, must be present on your application server,
  • the tasks image, registry.gitlab.teklia.com/arkindex/tasks:X.Y.Z, will be used to by the remote workers (file imports, thumbnails generation, …).

We do not mention explicit versions in this documentation: you should update the X.Y.Z mentions to the latest Arkindex release available.

Arkindex Platform and a single Worker

The backend image mentioned above will run in two containers on your application server:

  1. for the API, this is really the heart of Arkindex,
  2. for the local asynchronous tasks that can directly reach the database (SQLite export, element deletion, …)

Running through Docker Compose

Here is a sample docker-compose.yml that you can use to run the Arkindex platform on top of the previously mentioned services:

---
include:

  - docker-compose.services.yml

services:

  backend:
    container_name: ark-backend
    image: registry.gitlab.teklia.com/arkindex/backend:X.Y.Z

    depends_on:

      - db
      - redis
      - lb

    labels:
      traefik.enable: true
      traefik.http.routers.backend.rule: Host(`ark.localhost`) && (PathPrefix(`/api/`) || PathPrefix(`/api-docs/`) || PathPrefix(`/admin/`) || PathPrefix(`/rq/`) || PathPrefix(`/static/`))
      traefik.http.routers.backend.tls: true

    environment:
      CONFIG_PATH: /arkindex.yml

    volumes:

      - ./config.yml:/arkindex.yml:ro

    healthcheck:
      # start_interval is not fully implemented in Docker, until then we will use a short interval all the time
      # https://github.com/moby/moby/issues/45897
      interval: 5s

  worker:
    container_name: ark-worker
    image: registry.gitlab.teklia.com/arkindex/backend:X.Y.Z
    command: arkindex rqworker-pool --num-workers 2 -v 1 default high tasks export

    depends_on:

      - db
      - redis
      - backend
    environment:
      CONFIG_PATH: /arkindex.yml

    volumes:

      - ./config.yml:/arkindex.yml:ro

      # Required to host temporary ponos data
      # and share common paths between host and containers

      - /tmp:/tmp

      # Required to run process tasks

      - /var/run/docker.sock:/var/run/docker.sock

To make it work, you’ll need:

  • the docker-compose.services.yml described in the previous section
  • a configuration file config.yml described in the next section

You can simply run all the services using docker-compose up

First run

On your first run you’ll need to setup the database:

docker exec ark-backend arkindex migrate

You may encounter issues about PostgreSQL extensions if you created a PostgreSQL non-superuser for Arkindex (which is a good security measure). In this case, you’ll need to connect once as a PostgreSQL superuser (often as postgres) and create manually these 3 extensions:

create extension postgis;
create extension hstore;
create extension btree_gist;

Once migrations are fully done, you can create your own administrator account:

docker exec -it ark-backend arkindex createsuperuser

You should then be able to connect to the administrative interface on your instance url /admin page.

Finally, you can bootstrap a few needed objects in the database:

  • Image server for local upload
  • Worker versions for imports and inference processes
  • Default Ponos farm

If you do not plan on uploading documents through the frontend interface, this is optional.

docker exec ark-backend arkindex bootstrap
docker exec ark-backend arkindex update_system_workers

File storage

With the default configuration we provide, your local files will be stored in a MinIO local instance available as https://minio.ark.localhost

You’ll be able to login on the management web interface using the default (non-secure) login/password: minio1234/minio1234.

All the Arkindex required buckets are automatically created, provided you do not change their name in the configuration below. Otherwise you can create new buckets easily through the MinIO web interface.

Configuration

All the configuration options for the backend are detailed on this page.

A minimal configuration file is available here, with the following assumptions:

  • domain name is ark.localhost
  • arkindex release is X.Y.Z
  • third-party services are available though a docker network (so you can use names like ark-database instead of IPs)
  • specific values (image server IDs, worker versions, …) are provided by running arkindex bootstrap
---
# This file must be exposed to the backend and worker container
# using a read-only Docker volume
# You can set its path in the container by using the environment variable CONFIG_PATH

# Connection to the postgresql database
# Here we use a postgresql container on the same network
database:
  host: ark-database
  port: 5432
  name: arkindex_public
  user: public_user
  password: public_data


# Connection to the redis server to share asynchronous local jobs
redis:
  host: ark-redis

# Connection to an S3-compatible storage API
# Here we use a minio container on the same network
s3:
  access_key_id: minio1234
  secret_access_key: minio1234
  endpoint: https://minio.ark.localhost
  region: local

# Connection to the search engine
# This is only needed if the search feature is enabled
solr:
  api_url: http://ark-solr:8983/solr/

# Cache system to use for performance
# In production we recommend to use redis
cache:
  type: memory

# Control the optional features on your instance
features:
  signup: yes
  search: yes

# Use remote frontend files, hosted by Teklia
# You need to synchronize the version mentioned here
# with the one from your backend
static:
  frontend_version: X.Y.Z
  cdn_assets_url: https://assets.teklia.com/arkindex

# Configure the remote workers credentials
# to allow them to communicate with this Arkindex instance
ponos:
  private_key: /etc/ponos.key
  default_env:
    ARKINDEX_API_URL: https://ark.localhost/api/v1/

    # Do not change this setting if you use the bootstrap script
    ARKINDEX_API_TOKEN: deadbeefTestToken

# Root URL of the Arkindex instance
# Used to build external links (in emails)
public_hostname: https://ark.localhost

# Configure the django settings for session & CSRF cookies
# along with CORS allowed hosts
# These should match your public hostname
session:
  cookie_domain: ark.localhost
csrf:
  cookie_domain: ark.localhost
  trusted_origins:

    - 'https://*.ark.localhost'
cors:
  origin_whitelist:
    - https://ark.localhost

# HTTP hosts allowed to reach the server
# This should match your public hostname
# Note the leading .
allowed_hosts:

  - .ark.localhost


# IIIF Image Server used to expose the locally uploaded images
# Do not change this setting if you use the bootstrap script
local_imageserver_id: 12345

Frontend static assets

We recommend you to use our own CDN for the frontend files. Simply use the assets.teklia.com as source for static files in the backend configuration.

Here is part of the relevant configuration for your backend:

# Use remote frontend files, hosted by Teklia
# You need to synchronize the version mentioned here
# with the one from your backend
static:
  frontend_version: X.Y.Z
    cdn_assets_url: https://assets.teklia.com/arkindex