Setup with Docker

    In this documentation, we'll provide information on how to setup Arkindex Community Edition through Docker, on a single server.

    If you need more information for a cluster or cloud setup, or using Enterprise Edition, please contact us.

    Requirements🔗

    • A bare metal server running Linux Ubuntu LTS (22.04) for the platform
      • If you plan to run Machine Learning processes, you'll need another server with a GPU
    • Docker installed on that server
    • A domain name for that server

    Third-party services🔗

    You'll need to setup multiple companion services that support Arkindex. All these services are open source and freely available.

    Required services:

    • a load balancer, Traefik that will control traffic from your users towards the different services.
    • a message broker for asynchronous service: redis
    • a relational database for all data stored in Arkindex: postgres
      • the postgis extension is also required

    Optional services:

    • a remote storage server S3-compatible, MinIO
      • You can use AWS S3 or any other API-compatible provider instead
    • a IIIF server for your images, cantaloupe
    • a search engine to lookup your transcriptions: Apache Solr

    You can find a detailed docker-compose file in the Arkindex backend repository.

    Of course, you'll need to tweak the file so that it matches your own settings and domain name:

    • replace all ark.localhost references with your own domain name
    • remove the minio references if you are using Amazon S3 buckets (or any other compatible solution)
    • remove solr references if you do not need search capabilities

    Arkindex platform🔗

    You'll need several docker images from Teklia to run the Arkindex platform:

    • the backend image, tagged registry.gitlab.teklia.com/arkindex/backend:X.Y.Z, must be present on your application server,
    • the tasks image, registry.gitlab.teklia.com/arkindex/tasks:X.Y.Z, will be used to by the remote workers (file imports, thumbnails generation, ...).

    We do not mention explicit versions in this documentation: you should update the X.Y.Z mentions to the latest Arkindex release available.

    Arkindex Platform and a single Worker
    Arkindex Platform and a single Worker

    The backend image mentioned above will run in two containers on your application server:

    1. for the API, this is really the heart of Arkindex,
    2. for the local asynchronous tasks that can directly reach the database (sqlite export, element deletion, ...)

    Running through Docker Compose🔗

    Here is a sample docker-compose.yml that you can use to run the Arkindex platform on top of the previously mentioned services:

    ---
    include:
      - docker-compose.services.yml
    
    services:
    
      backend:
        container_name: ark-backend
        image: registry.gitlab.teklia.com/arkindex/backend:X.Y.Z
    
        depends_on:
          - db
          - redis
          - lb
    
        labels:
          traefik.enable: true
          traefik.http.routers.backend.rule: Host(`ark.localhost`) && (PathPrefix(`/api/`) || PathPrefix(`/api-docs/`) || PathPrefix(`/admin/`) || PathPrefix(`/rq/`) || PathPrefix(`/static/`))
          traefik.http.routers.backend.tls: true
    
        environment:
          CONFIG_PATH: /arkindex.yml
    
        volumes:
          - ./config.yml:/arkindex.yml:ro
    
        healthcheck:
          # start_interval is not fully implemented in Docker, until then we will use a short interval all the time
          # https://github.com/moby/moby/issues/45897
          interval: 5s
    
      worker:
        container_name: ark-worker
        image: registry.gitlab.teklia.com/arkindex/backend:X.Y.Z
        command: arkindex rqworker-pool --num-workers 2 -v 1 default high tasks
    
        depends_on:
          - db
          - redis
          - backend
        environment:
          CONFIG_PATH: /arkindex.yml
    
        volumes:
          - ./config.yml:/arkindex.yml:ro
    
          # Required to host temporary ponos data
          # and share common paths between host and containers
          - /tmp:/tmp
    
          # Required to run process tasks
          - /var/run/docker.sock:/var/run/docker.sock
    

    To make it work, you'll need:

    • the docker-compose.services.yml described in the previous section
    • a configuration file config.yml described in the next section

    You can simply run all the services using docker-compose up

    First run🔗

    On your first run you'll need to setup the database:

    docker run ark-backend arkindex migrate
    

    And create your own administrator account:

    docker run -it ark-backend arkindex createsuperuser
    

    You should then be able to connect to the administrative interface on your instance url /admin page.

    Finally, you can bootstrap a few needed objects in the database:

    • Image server for local upload
    • Worker version for local upload
    • default ponos farm

    If you do not plan on uploading documents through the frontend interface, this is optional.

    docker run ark-backend arkindex bootstrap
    

    Configuration🔗

    All the configuration options for the backend are detailed on this page.

    A minimal configuration file is available here, with the following assumptions:

    • domain name is ark.localhost
    • arkindex release is X.Y.Z
    • third-party services are available though a docker network (so you can use names like ark-database instead of IPs)
    • specific valeus (image server IDs, worker versions, ...) are provided by running arkindex bootstrap
    ---
    # This file must be exposed to the backend and worker container
    # using a read-only Docker volume
    # You can set its path in the container by using the environment variable CONFIG_PATH
    
    # Connection to the postgresql database
    # Here we use a postgresql container on the same network
    database:
      host: ark-database
      port: 5432
      name: arkindex_public
      user: public_user
      password: public_data
    
    
    # Connection to the redis server to share asynchronous local jobs
    redis:
      host: ark-redis
    
    # Connection to an S3-compatible storage API
    # Here we use a minio container on the same network
    s3:
      access_key_id: minio1234
      secret_access_key: minio1234
      endpoint: https://minio.ark.localhost
      region: local
    
    # Connection to the search engine
    # This is only needed if the search feature is enabled
    solr:
      api_url: http://ark-solr:8983/solr/
    
    # Cache system to use for performance
    # In production we recommend to use redis
    cache:
      type: memory
    
    # Control the optional features on your instance
    features:
      signup: yes
      search: yes
    
    # Use remote frontend files, hosted by Teklia
    # You need to synchronize the version mentioned here
    # with the one from your backend
    static:
      frontend_version: X.Y.Z
      cdn_assets_url: https://assets.teklia.com/arkindex
    
    # Configure the remote workers credentials
    # to allow them to communicate with this Arkindex instance
    ponos:
      private_key: /etc/ponos.key
      default_env:
        ARKINDEX_API_URL: https://ark.localhost/api/v1/
    
        # Do not change this setting if you use the bootstrap script
        ARKINDEX_API_TOKEN: deadbeefTestToken
    
    # Root URL of the Arkindex instance
    # Used to build external links (in emails)
    public_hostname: https://ark.localhost
    
    # Configure the django settings for session & CSRF cookies
    # along with CORS allowed hosts
    # These should match your public hostname
    session:
      cookie_domain: ark.localhost
    csrf:
      cookie_domain: ark.localhost
      trusted_origins:
        - 'https://*.ark.localhost'
    cors:
      origin_whitelist:
        - https://ark.localhost
    
    # HTTP hosts allowed to reach the server
    # This should match your public hostname
    # Note the leading .
    allowed_hosts:
      - .ark.localhost
    
    
    # IIIF Image Server used to expose the locally uploaded images
    # Do not change this setting if you use the bootstrap script
    local_imageserver_id: 12345
    
    # Worker version used by the file imports tasks
    # Do not change this setting if you use the bootstrap script
    imports_worker_version: f2bb8dd7-55e9-49ae-9bd9-b1d2e5d491b9
    

    Frontend static assets🔗

    We recommend you to use our own CDN for the frontend files. Simply use the assets.teklia.com as source for static files in the backend configuration.

    Here is part of the relevant configuration for your backend:

    # Use remote frontend files, hosted by Teklia
    # You need to synchronize the version mentioned here
    # with the one from your backend
    static:
      frontend_version: X.Y.Z
    	cdn_assets_url: https://assets.teklia.com/arkindex