Sustainability

Deployment in the Cloud

There are several aspects to the way the Bioregistry web application is deployed listed in this section.

Domain Name

The bioregistry.io domain is registered with Namecheap and costs about $33 per year. It is managed and supported by the Gyori Lab for Computaional Biomedicine at Northeastern University.

Software

The Bioregistry source code is licensed under the MIT License and is hosted openly on GitHub at https://github.com/biopragmatics/bioregistry. These are the software and operating system specifications for the currently running instance of the Bioregistry:

Bioregistry Version
0.10.125

Hardware

The Bioregistry is hosted on an Amazon Elastic Compute Cloud (EC2) via a load balancing service to stay secure and highly available. It is managed and supported by the Gyori Lab for Computational Biomedicine at Northeastern University.

These are the hardware and operating system specifications for the currently running instance of the Bioregistry:

Python
3.11.7
Platform
Linux-4.4.0-1061-aws-x86_64-with
Platform Version
#70-Ubuntu SMP Fri May 25 21:47:34 UTC 2018
Deployed
2024-01-25 08:38:29.439958

Containerization

A Docker image is automatically built nightly following the update workflow on GitHub Actions and pushed to the biopragmatics/bioregistry DockerHub repository. This image is built with the Python 3.9 alpine base image, which significantly reduces non-essential components. The final compressed image weights less than 40 MB of disk space and runs inside Docker with about 65 MB of memory at baseline. This could easily fit on a dedicated t4g.nano instance on AWS that costs about $37/year on-demand or around $20/year reserved.

Deployment

The Bioregistry's EC2 instance runs the following script on a cron job that stops the current running instance, pulls the latest image from this DockerHub repository and starts it back up. The whole process only takes a few seconds.

#!/bin/bash
# /data/services/restart_bioregistry.sh

# Store the container's hash
BIOREGISTRY_CONTAINER_ID=$(docker ps --filter "name=bioregistry" -aq)

# Stop and remove the old container, taking advantage of the fact that it's named specifically
if [ -n "BIOREGISTRY_CONTAINER_ID" ]; then
  docker stop $BIOREGISTRY_CONTAINER_ID
  docker rm $BIOREGISTRY_CONTAINER_ID
fi

# Pull the latest
docker pull biopragmatics/bioregistry:latest

# Run the start script, remove -d to run interactively
docker run -id --name bioregistry -p 8766:8766 biopragmatics/bioregistry:latest

This script can be put on the EC2 instance and run via SSH with:

#!/bin/bash

ssh -i ~/.ssh/<credentials>.pem <user>@<address> 'sh /data/services/restart_bioregistry.sh'

SSL/TLS

The SSL/TLS certificate for bioregistry.io so it can be served with HTTPS is managed through the AWS Certificate Manager.

Project Longevity

The Bioregistry is funded by the Chan Zuckerberg Initiative (CZI) Open Science Grant 2023-329850 which stipulates unlimited no-cost extensions. We have allocated part of this grant to ensure that the domain registration, hosting, and hardware will be funded in the medium- and long term under a conservative cost estimate of around $100-200/year.

The Bioregistry implements the Open Code, Open Data, Open Infrastructure (O3) Guidelines as a means to enable and encourage community contribution and maintenance in the medium- and long term. All code is permissively licensed with the MIT License and all data is under the Creative Commons Zero (CCO) license, meaning anyone can reuse the data as they see fit.

Mirroring

The Bioregistry can be mirrored following these instructions.

Deploying with Custom Content

The Bioregistry can be deployed using custom content by following these instructions.

Project Governance

Stakeholders in the Bioregistry have been interested in questions including:

  • Will the service still be running in 10 years?
  • Who makes technical decisions about how the web application?
  • Who makes curation decisions about the underlying registry?
  • Should there be acceptance criteria for new entries, such as a minimum metadata standard or assessment of impact?
  • How will the Bioregistry avoid the pitfalls of a closed curation process like the one implemented by Identifiers.org?
  • How should inevitable prefix/namespace collisions be handled?

These questions do not have easy answers and apply to most databases, software, and web applications in the life sciences. As first steps towards addressing those, we have written explicit, public, well-defined contribution guidelines, code of conduct, and project governance.

If you would like to be part of this discussion and/or development of these policies, you can try the following:

Evaluation of FAIR Data Principles

Content negotiation was implemented in PR #682 in order to better comply with FAIR-ness evaluations such as th FAIR Enough Evaluation