If you are interested in using Arkindex on your own documents, but cannot publish them on our own instances (due to privacy or regulatory concerns), it's possible to deploy the full Arkindex platform on your own infrastructure.
In the following sections, we'll describe the requirements needed to run an efficient and scalable Arkindex infrastructure using Docker containers on your own hardware. This setup is able to handle millions of documents to process with multiple Machine Learning processes.
The main part of the architecture uses a set of open-source software along with our own proprietary software.
The open source components here are:
You'll also need to run a set of workers on dedicated servers: this is where the Machine Learning processes will run.
Each worker in the diagram represents a dedicated server, running our in-house job scheduling agents and dedicated Machine Learning tasks.
We recommend to use Docker Swarm to aggregate several web servers along with at least one server for databases.
At least 2 web nodes must run for efficient results in production.
These servers can be virtual machines (VPS) or dedicated servers on bare metal, with recommended specifications:
Should host these services:
This server must be a dedicated server on bare metal, using SSD for database storage, with recommended specifications:
Should host these services:
Each worker can be an independent server, and is not necessarily connected directly to the platform (it only needs to communicate through the REST API of the platform, no database access is needed).
The requirement of each server depends on the type of your processes and datasets. We recommend to use bare-metal servers with at least 8 cores at 2Ghz and 16Gb of RAM. You may also need some GPUs for specific use cases. Please describe your datasets with samples so we can reply with specific requirements for any inquiry.
Please contact us if you are interested in this solution for your company or institution.
We can also provide a private instance that we manage on our servers (hosted in Europe or North America).
More information on running Arkindex using docker-compose