Sustainability

Deployment in the Cloud

There are several aspects to the way the Bioregistry web application is deployed listed in this section.

Domain Name

The bioregistry.io domain is registered with Namecheap and costs about $33 per year. It is managed and supported by the INDRA Lab, a part of the Laboratory of Systems Pharmacology and Harvard Program in Therapeutic Science (HiTS) at Harvard Medical School.

Hardware

The Bioregistry is hosted on an Amazon Elastic Compute Cloud (EC2) via a load balancing service to stay secure and highly available. It is managed and supported by the INDRA Lab, a part of the Laboratory of Systems Pharmacology and Harvard Program in Therapeutic Science (HiTS) at Harvard Medical School.

Software

These are the software and operating system specifications for the currently running instance of the Bioregistry:

Bioregistry Version
0.3.30
Python
3.9.7
Platform
Linux-5.4.0-1015-aws-x86_64-with
Platform Version
#15-Ubuntu SMP Thu Jun 4 22:47:00 UTC 2020
Deployed
2021-10-14 10:28:11.098959

Containerization

A Docker image is automatically built on a cron job in the biopragmatics/bioregistry-docker GitHub repository and pushed to the biopragmatics/bioregistry DockerHub repository. This image is built with the Python 3.9 alpine base image, which significantly reduces non-essential components. The final compressed image weights less than 40 MB of disk space and runs inside Docker with about 65 MB of memory at baseline. This could easily fit on a dedicated t4g.nano instance on AWS that costs about $37/year on-demand or around $20/year reserved.

Deployment

The Bioregistry's EC2 instance runs the following script on a cron job that stops the current running instance, pulls the latest image from this DockerHub repository and starts it back up. The whole process only takes a few seconds.

#!/bin/bash

# store the bioregistry id
BIOREGISTRY_CONTAINER_ID=$(docker ps --filter "name=bioregistry" -q)

# Stop and remove the old container taking advantage
#   of the fact that it's named specifically
docker stop $BIOREGISTRY_CONTAINER_ID
docker rm $BIOREGISTRY_CONTAINER_ID

# Pull the latest
docker pull biopragmatics/bioregistry:latest

# Run the start script
docker run -id --name bioregistry -p 8766:8766 biopragmatics/bioregistry:latest

This script can be put on the EC2 instance and run via SSH with:

#!/bin/bash

ssh -i ~/.ssh/<credentials>.pem <user>@<address> 'sh /data/services/restart_bioregistry.sh'

SSL/TLS

The SSL/TLS certificate for bioregistry.io so it can be served with HTTPS is managed through the AWS Certificate Manager.

Data Maintenance, Interoperability with other Projects, and Project Governance

So far, the Bioregistry (both the data and web application) have been curated and developed within a small team and used by a limited audience. This means we have been able to address issues very quickly. However, this attitude is not scalable to larger audiences, where changes have higher potential for disruption of users' usages. With that being said, there are many things we are looking to discuss with potential stakeholders including (but not limited to):

  • Will the service still be running in 10 years?
  • Who makes technical decisions about how the web application?
  • Who makes curation decisions about the underlying registry?
  • Should there be acceptance criteria for new entries, such as a minimum metadata standard or assessment of impact?
  • How should inevitable prefix/namespace collisions be handled?
  • ... and many more

These questions do not have easy answers and apply to most databases, software, and web applications in the life sciences. If you would like to be part of this discussion, please join us on the OBO Community Slack workspace's #ontotools channel.