An overview on registries covering biomedical ontologies, controlled vocabularies, and databases.
A 🟢 means the field is required. A 🟡 means it is
part of the schema, but not required or incomplete on some entries. A 🔴 means that
it is not part of the metadata schema. For lookup services like the OLS, some fields (i.e., Example ID,
Default Provider, Alternate Providers) are omitted because inclusion would be redundant.
Data Model Score
The weighted sum of green dots, less valuable yellow dots, and some negatively weighted red dots. Higher is
This field denotes if a name is required, optional, or never captured for each record in the registry.
This field denotes if a homepage is required, optional, or never captured for each record in the registry.
This field denotes if a description is required, optional, or never captured for each record in the registry.
This field denotes if an example local unique identifier is required, optional, or never captured for each record in the registry.
This field denotes if a regular expression pattern for matching local unique identifiers is required, optional, or never captured for each record in the registry.
This field denotes if a URI format string for converting local unique identifiers into URIs is required, optional, or never captured for each record in the registry.
This field denotes if additional/secondary URI format strings for converting local unique identifiers into URIs is required, optional, or never captured for each record in the registry.
This field denotes if alternative prefixes (e.g., taxonomy for NCBITaxon) is required, optional, or never captured for each record in the registry.
This field denotes if capturing the data license is required, optional, or never captured for each record in the registry.
This field denotes if capturing the current data version is required, optional, or never captured for each record in the registry.
This field denotes if capturing the primary responsible person's contact information (e.g., name, ORCID, email) is required, optional, or never captured for each record in the registry.
Notes: Several of Wikidata's fields can be accessed indirectly with alternative SPARQL queries.
Non-english language registries in the OntoPortal Alliance were not considered.
Capabilities and Qualities
This section provides a systematic evaluation and comparison of
the capabilities of each registry.
The sum of the number of green dots across each row.
This field denotes if the registry provides structured access to its data? For example, this can be through an API (e.g., FAIRsharing, OLS) or a bulk download (e.g., OBO Foundry) in a structured file format. A counter-example is a site that must be scraped to acquire its content (e.g, the NCBI GenBank).
This field denotes if the registry provides a bulk dump of its data? For example, the OBO Foundry provides its bulk data in a file and Identifiers.org provides its bulk data in an API endpoint. A counterexample is FAIRsharing, which requires slow, expensive pagination through its data. Another counterexample is HL7 which requires manually navigating a form to download its content. While GenBank is not structured, it is still bulk downloadable.
This field denotes if the registry provides access to its data without an API key? For example, Identifiers.org. As a counter-example, BioPortal requires an API key for access to its structured data.
This field denotes if the registry makes its data available downloadable in an automated way?This includes websites that have bulk downloads, paginated API downloads, or even require scraping.A counter example is HL7, whose download can not be automated due to the need to interact with a web form.
This field denotes if the registry uses a license that permits reuse and or remixing? Based on the OBO
Foundry's FP-001 "openness" principle, this
includes Creative Commons CC BY 3.0, CC BY 4.0, and CC Zero. This explicitly does not include resources
licensed with share-alike clauses, no derivatives clauses, or ones that are missing license statements
This field denotes if the registry provides either a dedicated page for searching for prefixes (e.g. AberOWL has a dedicated search page) OR a contextual search (e.g., AgroPortal has a prefix search built in its homepage).
This field denotes if the registry provides information about its own prefixes either
in the form of a web page or an API endpoint. These can be accessed
through a stable URL into which a prefix from the registry can be formatted.
This field denotes if the registry can act as a resolver, i.e., it redirects to an external
page about a given biomedical concept or entity based on its CURIE and
the registry's internal metadata data about the prefix's associated
URI format string.
This field denotes if the registry act as a lookup service, i.e., it gives information
about a given biomedical concept or entity based on its CURIE.
This section provides a systematic evaluation and comparison of
the governance and standard operating procedures for each registry. We generated the following list of
Are there clear, public policies on what content can be added to the registry?
Are there clear, public policies on who is allowed to add content to the registry?
Are there clear, public policies on why/how content is edited, deprecated, or removed from the registry?
Are community members able to petition for updates to resources that they do not "own", for example, if
there is a typo in the metadata?
Does the community have clear, public policies for handling records that have been abandoned by the
Are there clear, public guidelines on how to contribute to the registry? We argue that open contribution,
e.g., via a request in an issue tracker or directly by creating a pull
request is better due to the ability to better engage other community members and stakeholders
Does the registry make its data available under a data-appropriate, permissive, well-understood license
(e.g., CC Zero or CC BY 4.0)?
Does the registry make its underlying code open source under version control?
Are there similar appropriate policies for the code with respect to contribution and moderation as
previously described for the content of the registry?
Does the community have a public issue tracker related to both curation and technical issues with the
registry? A counter-example is that some communities require petitioning the moderator(s) privately by
Are there clear, public, up-to-date resources listing who has the technical ability to make updates to the
registry (i.e., the community moderator(s))?
Are the community moderators responsive on the issue tracker? This can be compared between communities using
measurements like how many total issues are open on the tracker, how many have been unanswered by a
moderator for more than a certain amount of time, how quickly issues are closed on average, etc.
Is there a clear, public governance structure for inducting/removing community moderators?
Are the moderators from heterogeneous institutions/scientific domains?
Are contributions from the community attributed (both on a technical level, e.g., by associating ORCID
identifiers to records, and also during scientific publication, e.g., as acknowledgments or including
contributors as co-authors)?
Does the community have a clear, public code of conduct?
Do the moderators (or wider community) organize discussions, such as community meetings or workshops?
We have made a survey of a subset of these questions which are presented in the table below, but, first, an
explanation of each field is given.
The sum of the following boolean fields and some additional logic. One point is deducted from registries
with internal-focused scope.
Accepts External Contributions
This field denotes if the registry (in theory) accepts external contributions, either via suggestion or proactive improvement. This field does not pass judgement on the difficult of this process from the perspective of the submitter nor the responsiveness of the registry. This field does not consider the ability for insiders (i.e., people with private relationships to the maintainers) to affect change.
Public Version-Controlled Data
This field denotes if the registry stores its data in publicly available version control system, such as GitHub or GitLab
This field denotes the public issue tracker for issues related to the code and data of the repository.
This field denotes if the registry's reviewers/moderators for external contributions known? If there's a well-defined, maintained listing, then it can be marked as public. If it can be inferred, e.g. from reading the commit history on a version control system, then it can be marked as inferrable. A closed review team, e.g., like for Identifiers.org can be marked as private. Resources that do not accept external contributions can be marked with N/A. An unmoderated regitry like Prefix.cc is marked with 'democratic'.
This field denotes the scope of prefixes which the registry covers. For example, some registries are limited to ontologies, some have a full scope over the life sciences, and some are general purpose.
This field denotes the maitenance status of the repository. An active repository is still being maintained and also is responsive to external requests for improvement. An unresponsive repository is still being maintained in some capacity but is not responsive to external requests for improvement. An inactive repository is no longer being proactively maintained (though may receive occasional patches).
The semantic web and ontology communities are bound to the use of IRIs as identifiers and therefore are very
interested in the interconversion between compact identifiers (i.e., CURIEs) and IRIs. While the Bioregistry
provides many tools for one way conversion from CURIEs to IRIs, there are several related packages that help
parse CURIEs from IRIs:
The @geneontogy/dbxrefs Node.js package
translates CURIEs into URLs using the Gene Ontology Registry.
The curie-util-py Python package more generally
loads JSON-LD files to convert between IRIs and CURIEs.