Architectures

We’ll use different terms for the components of our product:

  • Platform server is the server that will run the Backend code responsible for the REST API.

  • Arkindex needs to run some specific asynchronous tasks that require direct access to the database: the local worker will execute these tasks.

  • In the Enterprise Edition, some intensive Machine Learning tasks will be executed by Remote workers, using a proprietary software called Ponos. One instance of Ponos is called an Agent.

Overview

The main part of the architecture uses a set of open-source software along with our own software.

graph TD lb[Load balancer] --> frontend[Arkindex frontend] frontend --> backend lb --> backend[Arkindex backend] lb --> cantaloupe[IIIF server] lb --> minio[S3-compatible storage] cantaloupe --> minio backend --> worker[Arkindex internal worker] worker --> backend backend --> redis[Cache] backend --> db[Database] backend --> solr[Search engine]
Figure 1. Arkindex platform architecture

The open source components here are:

  • Traefik as load balancer

  • Cantaloupe as IIIF server

  • MinIO as S3-compatible storage server

  • Redis as cache

  • PostgreSQL as database

  • Solr as search engine

Machine Learning

In the Enterprise Edition, you’ll also need to run a set of workers on dedicated servers: this is where the Machine Learning processes will run.

graph TD subgraph arkindex lb[Load Balancer] --> backend[Arkindex backend] backend --> database[Database] end subgraph worker 1 agent_1[Agent] -..-> lb agent_1 --> task_1 agent_1 --> task_2 end subgraph worker 2 agent_2[Agent] -..-> lb agent_2 --> task_3 agent_2 --> task_4 end
Figure 2. Arkindex workers for Machine Learning

Each worker in the diagram represents a dedicated server, running our in-house job scheduling agents and dedicated Machine Learning tasks.

Common cases

We only cover the most common cases here. If you have questions about your own architecture, please contact us.

Single Server

This is the simplest option, a standalone server that hosts all the services using Docker containers.

A single docker-compose.yml can efficiently deploy the whole stack.

graph TD subgraph server lb[Load balancer] --> frontend[Arkindex frontend] frontend --> backend lb --> backend[Arkindex backend] lb --> cantaloupe[IIIF server] lb --> minio[S3-compatible storage] cantaloupe --> minio backend --> worker[Arkindex internal worker] worker --> backend backend --> redis[Cache] backend --> db[Database] backend --> solr[Search engine] end
Figure 3. Arkindex stack on a single server

Pros

  • Simple to deploy and maintain

  • Cheap

Cons

  • Limited disk space

  • Limited performance

  • Single point of failure

Cluster

With more budget, you can deploy Arkindex across several servers, still using Docker Compose along with placement constraints on Docker Swarm.

A Docker Swarm cluster enables you to run Docker services instead of containers, with multiple containers per service so you can benefit from higher throughput and eliminate single points of failure.

graph TD subgraph server_web1[WebServices n°2] lb[Load balancer] lb --> backend[Arkindex backend] lb --> cantaloupe[IIIF server] end subgraph server_web2[WebServices n°2] lb --> backend2[Arkindex backend] lb --> cantaloupe2[IIIF server] end subgraph storage[File storage] lb --> minio end subgraph server_db1[Databases] minio[S3-compatible storage] redis[Cache] db[Database] solr[Search engine] end subgraph server_worker1[Worker n°1] direction LR worker[Arkindex internal worker] worker2[Arkindex internal worker n°2] end subgraph server_worker2[Worker n°2] ponos -->ml_task[Machine Learning Task] end server_web1 -..-> server_db1 server_web2 -..-> server_db1 server_worker1 -..-> server_db1 server_worker2 -..-> server_web1 server_worker2 -..-> server_web2
Figure 4. Arkindex stack on a Docker Swarm cluster

Pros

  • High performance

  • Services replica for high availability

  • Network segregation for better security

Cons

  • Limited disk space

  • Harder to maintain and monitor

Cloud provider

You can also deploy Arkindex using a Cloud provider (like Amazon AWS, Google GCP, Microsoft Azure), using their managed services to replace self-hosting databases and shared S3-compatible storage.

Most cloud providers provide managed offers for the services required by Arkindex (Load balancer, PostgreSQL, S3-compatible storage, search engine & Redis cache). You’ll then need to run Arkindex containers:

  • through managed Docker containers;

  • by building your own Docker swarm cluster on their VPS offering.

graph TD lb[Load balancer] --> docker subgraph docker[Containers] backend[Arkindex backend] cantaloupe[IIIF server] worker[Arkindex internal worker] end subgraph services[Managed service] minio[S3-compatible storage] redis[Cache] db[Database] solr[Search engine] end subgraph gpu[GPU-enabled services] ponos -->ml_task[Machine Learning Task] end docker --> services gpu --> docker
Figure 5. Arkindex stack on a cloud provider

Pros

  • High performance

  • Low maintenance for non-hosted services

  • Unlimited disk space

Cons

  • Expensive

  • Vendor lock-in