Comparison of Registries

An overview on registries covering biomedical ontologies, controlled vocabularies, and databases.

Data Models

A 🟢 means the field is required. A 🟡 means it is part of the schema, but not required or incomplete on some entries. A 🔴 means that it is not part of the metadata schema. For lookup services like the OLS, some fields (i.e., Example ID, Default Provider, Alternate Providers) are omitted because inclusion would be redundant.

Data Model Score
The weighted sum of green dots, less valuable yellow dots, and some negatively weighted red dots. Higher is better.
Name
This field denotes if a name is required, optional, or never captured for each record in the registry.
Homepage
This field denotes if a homepage is required, optional, or never captured for each record in the registry.
Description
This field denotes if a description is required, optional, or never captured for each record in the registry.
Example
This field denotes if an example local unique identifier is required, optional, or never captured for each record in the registry.
Pattern
This field denotes if a regular expression pattern for matching local unique identifiers is required, optional, or never captured for each record in the registry.
Provider
This field denotes if a URI format string for converting local unique identifiers into URIs is required, optional, or never captured for each record in the registry.
Alternate Providers
This field denotes if additional/secondary URI format strings for converting local unique identifiers into URIs is required, optional, or never captured for each record in the registry.
Synonyms
This field denotes if alternative prefixes (e.g., taxonomy for NCBITaxon) is required, optional, or never captured for each record in the registry.
License
This field denotes if capturing the data license is required, optional, or never captured for each record in the registry.
Version
This field denotes if capturing the current data version is required, optional, or never captured for each record in the registry.
Contact
This field denotes if capturing the primary responsible person's contact information (e.g., name, ORCID, email) is required, optional, or never captured for each record in the registry.
Registry Data Model Score Name Homepage Description Example Pattern Provider Alternate Providers Alternate Prefixes License Version Contact
AberOWL 7 🟢 🟢 🟢 🟡 🔴 🟡 🔴 🔴 🔴 🟡 🔴
AgroPortal 13 🟢 🟢 🟢 🟡 🔴 🟢 🔴 🔴 🟡 🟡 🟡
BARTOC 5 🟢 🟡 🟡 🔴 🟡 🔴 🟡 🔴 🟡 🔴 🟡
BioContext -9 🔴 🔴 🔴 🔴 🔴 🟡 🔴 🔴 🔴 🔴 🔴
BioPortal 13 🟢 🟢 🟢 🟡 🔴 🟢 🔴 🔴 🔴 🟢 🟡
Biolink -1 🟢 🔴 🔴 🔴 🔴 🟢 🔴 🔴 🟡 🔴 🔴
Bioregistry 19 🟢 🟢 🟢 🟢 🟡 🟡 🟡 🟡 🟡 🟡 🟡
Cellosaurus 1 🟢 🟢 🔴 🔴 🔴 🟢 🔴 🔴 🔴 🔴 🔴
CHEMINF -7 🟢 🔴 🔴 🔴 🔴 🔴 🔴 🔴 🔴 🔴 🔴
CropOCT 3 🟢 🟢 🔴 🟡 🔴 🟢 🔴 🔴 🔴 🔴 🔴
EDAM -7 🟢 🔴 🔴 🔴 🔴 🔴 🔴 🔴 🔴 🔴 🔴
EcoPortal 9 🟢 🟡 🟢 🟡 🔴 🟡 🔴 🔴 🟡 🟡 🟡
FAIRSharing 5 🟢 🟢 🟢 🔴 🔴 🔴 🔴 🔴 🔴 🔴 🟢
GO 5 🟢 🟡 🟡 🟡 🟡 🟡 🟡 🔴 🔴 🔴 🔴
HL7 -1 🟢 🟡 🟡 🔴 🔴 🔴 🔴 🔴 🔴 🔴 🟡
Identifiers.org 15 🟢 🟢 🟢 🟢 🟢 🟢 🟡 🔴 🔴 🔴 🔴
N2T 17 🟢 🟢 🟢 🟢 🟢 🟢 🟡 🟡 🔴 🔴 🔴
NCBI 1 🟢 🟢 🔴 🟢 🔴 🔴 🔴 🔴 🔴 🔴 🔴
OBO Foundry 9 🟢 🟢 🟢 🔴 🔴 🔴 🔴 🔴 🟢 🔴 🟢
OntoBee 9 🟢 🟢 🟢 🟡 🔴 🟡 🔴 🔴 🔴 🔴 🟢
OLS 11 🟢 🟡 🟢 🟡 🔴 🟡 🔴 🔴 🟢 🟡 🟡
Prefix Commons 7 🟢 🟡 🟡 🟡 🟡 🟡 🟡 🟡 🔴 🔴 🔴
Prefix.cc -5 🔴 🔴 🔴 🔴 🔴 🟢 🟡 🔴 🔴 🔴 🔴
re3data 5 🟢 🟢 🟢 🔴 🔴 🔴 🔴 🔴 🟡 🔴 🟡
UniProt 1 🟢 🟢 🔴 🔴 🔴 🟢 🔴 🔴 🔴 🔴 🔴
Wikidata 13 🟢 🟡 🟢 🟡 🟡 🟡 🟡 🔴 🟡 🟡 🟡

Notes: Several of Wikidata's fields can be accessed indirectly with alternative SPARQL queries. Non-english language registries in the OntoPortal Alliance were not considered.

Capabilities and Qualities

This section provides a systematic evaluation and comparison of the capabilities of each registry.

Quality Score
The sum of the number of green dots across each row.
Structured Data
This field denotes if the registry provides structured access to its data? For example, this can be through an API (e.g., FAIRsharing, OLS) or a bulk download (e.g., OBO Foundry) in a structured file format. A counter-example is a site that must be scraped to acquire its content (e.g, the NCBI GenBank).
Bulk Data
This field denotes if the registry provides a bulk dump of its data? For example, the OBO Foundry provides its bulk data in a file and Identifiers.org provides its bulk data in an API endpoint. A counterexample is FAIRsharing, which requires slow, expensive pagination through its data. Another counterexample is HL7 which requires manually navigating a form to download its content. While GenBank is not structured, it is still bulk downloadable.
No Authentication
This field denotes if the registry provides access to its data without an API key? For example, Identifiers.org. As a counter-example, BioPortal requires an API key for access to its structured data.
Automatable Download
This field denotes if the registry makes its data available downloadable in an automated way?This includes websites that have bulk downloads, paginated API downloads, or even require scraping.A counter example is HL7, whose download can not be automated due to the need to interact with a web form.
Permissive License
This field denotes if the registry uses a license that permits reuse and or remixing? Based on the OBO Foundry's FP-001 "openness" principle, this includes Creative Commons CC BY 3.0, CC BY 4.0, and CC Zero. This explicitly does not include resources licensed with share-alike clauses, no derivatives clauses, or ones that are missing license statements entirely.
Prefix Search
This field denotes if the registry provides either a dedicated page for searching for prefixes (e.g. AberOWL has a dedicated search page) OR a contextual search (e.g., AgroPortal has a prefix search built in its homepage).
Prefix Provider
This field denotes if the registry provides information about its own prefixes either in the form of a web page or an API endpoint. These can be accessed through a stable URL into which a prefix from the registry can be formatted.
CURIE Resolver
This field denotes if the registry can act as a resolver, i.e., it redirects to an external page about a given biomedical concept or entity based on its CURIE and the registry's internal metadata data about the prefix's associated URI format string.
CURIE Lookup
This field denotes if the registry act as a lookup service, i.e., it gives information about a given biomedical concept or entity based on its CURIE.
Registry Mappings Quality Score Structured Data Bulk Data Requires Authentication Automatable Download Permissive License Prefix Search Prefix Provider CURIE Resolver CURIE Lookup
AberOWL 327 6 🟢 🟢 🟢 🟢 🔴 🟢 🟢 🔴 🟢
AgroPortal 68 4 🟢 🔴 🔴 🟢 🔴 🟢 🟢 🔴 🟢
BARTOC 36 6 🟢 🟢 🟢 🟢 🔴 🟢 🟢 🔴 🔴
BioContext 842 5 🟢 🟢 🟢 🟢 🔴 🔴 🟢 🔴 🔴
BioPortal 322 4 🟢 🔴 🔴 🟢 🔴 🟢 🟢 🔴 🟢
Biolink 98 4 🟢 🟢 🟢 🟢 🔴 🔴 🔴 🔴 🔴
Bioregistry 1597 7 🟢 🟢 🟢 🟢 🟢 🟢 🟢 🟢 🔴
Cellosaurus 100 5 🟢 🟢 🟢 🟢 🟢 🔴 🔴 🔴 🔴
CHEMINF 19 6 🟢 🟢 🟢 🟢 🟢 🔴 🟢 🔴 🟢
CropOCT 34 7 🟢 🟢 🟢 🟢 🟢 🟢 🟢 🔴 🟢
EDAM 116 6 🟢 🟢 🟢 🟢 🔴 🟢 🟢 🔴 🔴
EcoPortal 4 4 🟢 🔴 🔴 🟢 🔴 🟢 🟢 🔴 🟢
FAIRSharing 599 4 🟢 🔴 🔴 🟢 🔴 🟢 🟢 🔴 🔴
GO 142 5 🟢 🟢 🟢 🟢 🟢 🔴 🔴 🔴 🔴
HL7 46 3 🟢 🔴 🟢 🔴 🔴 🔴 🟢 🔴 🔴
Identifiers.org 786 7 🟢 🟢 🟢 🟢 🟢 🟢 🟢 🟢 🔴
N2T 677 6 🟢 🟢 🟢 🟢 🟢 🔴 🟢 🟢 🔴
NCBI 78 3 🔴 🟢 🟢 🟢 🔴 🔴 🔴 🔴 🔴
OBO Foundry 253 7 🟢 🟢 🟢 🟢 🟢 🟢 🟢 🟢 🔴
OntoBee 219 4 🔴 🟢 🟢 🟢 🔴 🔴 🟢 🔴 🟢
OLS 273 6 🟢 🟢 🟢 🟢 🔴 🟢 🟢 🔴 🟢
Prefix Commons 451 6 🟢 🟢 🟢 🟢 🔴 🟢 🟢 🔴 🔴
Prefix.cc 0 6 🟢 🟢 🟢 🟢 🟢 🔴 🟢 🔴 🔴
re3data 184 5 🔴 🔴 🟢 🟢 🟢 🟢 🟢 🔴 🔴
UniProt 100 7 🟢 🟢 🟢 🟢 🟢 🟢 🟢 🔴 🔴
Wikidata 167 7 🟢 🟢 🟢 🟢 🟢 🟢 🟢 🔴 🔴

Community and Governance

This section provides a systematic evaluation and comparison of the governance and standard operating procedures for each registry. We generated the following list of objective, measurable metrics:

  • Are there clear, public policies on what content can be added to the registry?
  • Are there clear, public policies on who is allowed to add content to the registry?
  • Are there clear, public policies on why/how content is edited, deprecated, or removed from the registry?
  • Are community members able to petition for updates to resources that they do not "own", for example, if there is a typo in the metadata?
  • Does the community have clear, public policies for handling records that have been abandoned by the submitter/responsible person?
  • Are there clear, public guidelines on how to contribute to the registry? We argue that open contribution, e.g., via a request in an issue tracker or directly by creating a pull request is better due to the ability to better engage other community members and stakeholders
  • Does the registry make its data available under a data-appropriate, permissive, well-understood license (e.g., CC Zero or CC BY 4.0)?
  • Does the registry make its underlying code open source under version control?
  • Are there similar appropriate policies for the code with respect to contribution and moderation as previously described for the content of the registry?
  • Does the community have a public issue tracker related to both curation and technical issues with the registry? A counter-example is that some communities require petitioning the moderator(s) privately by email.
  • Are there clear, public, up-to-date resources listing who has the technical ability to make updates to the registry (i.e., the community moderator(s))?
  • Are the community moderators responsive on the issue tracker? This can be compared between communities using measurements like how many total issues are open on the tracker, how many have been unanswered by a moderator for more than a certain amount of time, how quickly issues are closed on average, etc.
  • Is there a clear, public governance structure for inducting/removing community moderators?
  • Are the moderators from heterogeneous institutions/scientific domains?
  • Are contributions from the community attributed (both on a technical level, e.g., by associating ORCID identifiers to records, and also during scientific publication, e.g., as acknowledgments or including contributors as co-authors)?
  • Does the community have a clear, public code of conduct?
  • Do the moderators (or wider community) organize discussions, such as community meetings or workshops?

We have made a survey of a subset of these questions which are presented in the table below, but, first, an explanation of each field is given.

Governance Score
The sum of the following boolean fields and some additional logic. One point is deducted from registries with internal-focused scope.
Accepts External Contributions
This field denotes if the registry (in theory) accepts external contributions, either via suggestion or proactive improvement. This field does not pass judgement on the difficult of this process from the perspective of the submitter nor the responsiveness of the registry. This field does not consider the ability for insiders (i.e., people with private relationships to the maintainers) to affect change.
Public Version-Controlled Data
This field denotes if the registry stores its data in publicly available version control system, such as GitHub or GitLab
Issue Tracker
This field denotes the public issue tracker for issues related to the code and data of the repository.
Review Team
This field denotes if the registry's reviewers/moderators for external contributions known? If there's a well-defined, maintained listing, then it can be marked as public. If it can be inferred, e.g. from reading the commit history on a version control system, then it can be marked as inferrable. A closed review team, e.g., like for Identifiers.org can be marked as private. Resources that do not accept external contributions can be marked with N/A. An unmoderated regitry like Prefix.cc is marked with 'democratic'.
Scope
This field denotes the scope of prefixes which the registry covers. For example, some registries are limited to ontologies, some have a full scope over the life sciences, and some are general purpose.
Status
This field denotes the maitenance status of the repository. An active repository is still being maintained and also is responsive to external requests for improvement. An unresponsive repository is still being maintained in some capacity but is not responsive to external requests for improvement. An inactive repository is no longer being proactively maintained (though may receive occasional patches).
Registry Score Accepts External Contributions Public Version-Controlled Data Issue Tracker Review Team Scope Status
AberOWL 1 🔴 🔴 n/a bio-ontologies active
AgroPortal 3 🟢 🔴 private agro-ontologies active
BARTOC 6 🟢 🔴 public general active
BioContext 6 🟢 🟢 inferrable internal active
BioPortal 3 🟢 🔴 private bio-ontologies active
Biolink 6 🟢 🟢 inferrable internal active
Bioregistry 8 🟢 🟢 public life sciences active
Cellosaurus 0 🔴 🔴 n/a internal active
CHEMINF 6 🟢 🟢 inferrable chemistry unresponsive
CropOCT 5 🟢 🔴 public agro-ontologies active
EDAM 7 🟢 🟢 inferrable Life Sciences active
EcoPortal 2 🟢 🔴 private eco-ontologies active
FAIRSharing 2 🟢 🔴 private life sciences active
GO 5 🟢 🟢 inferrable internal inactive
HL7 2 🟢 🔴 n/a general active
Identifiers.org 3 🟢 🔴 private life sciences unresponsive
N2T 1 🔴 🔴 n/a life sciences active
NCBI -1 🔴 🔴 n/a internal inactive
OBO Foundry 8 🟢 🟢 public bio-ontologies active
OntoBee 3 🟢 🔴 n/a bio-ontologies active
OLS 7 🟢 🟢 inferrable bio-ontologies active
Prefix Commons 4 🟢 🔴 public life sciences active
Prefix.cc 6 🟢 🔴 democratic general active
re3data 2 🟢 🔴 private general active
UniProt 0 🔴 🔴 n/a internal active
Wikidata 3 🟢 🔴 inferrable general active

Related Software

Conversion between CURIEs and IRIs

The semantic web and ontology communities are bound to the use of IRIs as identifiers and therefore are very interested in the interconversion between compact identifiers (i.e., CURIEs) and IRIs. While the Bioregistry provides many tools for one way conversion from CURIEs to IRIs, there are several related packages that help parse CURIEs from IRIs:

  • The @geneontogy/dbxrefs Node.js package translates CURIEs into URLs using the Gene Ontology Registry.
  • The curie-util-py Python package more generally loads JSON-LD files to convert between IRIs and CURIEs.