Ontology paperFaBiO and CiTO: Ontologies for describing bibliographic resources and citations
Introduction
Scholarly authoring and publishing are in the throes of a revolution, as the full potential of on-line publishing is explored. Yet, to date, publishers have not adopted Web standards for their work, but rather employ a variety of proprietary XML-based informational models and document type definitions (DTDs). While such independence was reasonable in the pre-web world of paper publishing, it now appears anachronistic, since publications and their metadata from different sources are incompatible, requiring hand-crafted mappings to convert from one to another. For a large community such as publishers, this lack of standard definitions that could be adopted and reused across the entire industry represents losses in terms of money, time and effort.
In contrast, modern web information management techniques employ standards such as RDF [1] and OWL 2 [2] to encode information in ways that permit computers to query metadata and integrate web-based information from multiple resources in an automated manner. Since the processes of scholarly communication are central to the practice of science, it is essential that publishers now adopt such standards to permit inference over the entire corpus of scholarly communication represented in journals, books and conference proceedings. This requires the availability of appropriate ontologies that are specially tailored to the requirements of authors, publishers and their readers. The purpose of this paper is to present two such ontologies that form key components of the semantic publishing revolution.
Semantic publishing is the use of web and semantic web technologies to enhance a published document such as a journal article so as to enrich its meaning, to facilitate its automatic discovery, to enable its linking to semantically related articles, to provide access to data within the article in actionable form, and to allow integration of data between papers [3], [4]. Semantic publishing and scholarly citation using web standards are presently two of the most interesting topics within the scientific publishing domain. Research areas in this domain include the development of:
- •
semantic models (vocabularies, ontologies) that meet the requirements of scholarly authoring and publishing;
- •
visualization and documentation tools that permit such ontologies to be easily understood;
- •
annotation tools that allow these models to be used for enhancing documents with relevant semantic assertions;
- •
new algorithms to take advantages of these semantic annotations when searching over large sets of on-line documents.
The rest of this article is organised as follows: in Section 2 we introduce principles that have guided our work, and in Section 3 we briefly describe the existing ontologies and vocabularies we took into account when developing FaBiO and CiTO. These new ontologies are then presented in Sections 4 Representing bibliographic information using FaBiO, 5 Characterising citations with CiTO respectively. In Section 6 we describe the contexts in which FaBiO and CiTO are now being employed, and finally in Section 7 we sketch out some conclusions and future directions of our work.
Section snippets
Characteristics, starting point and principles
The main characteristics of this work, that mark it out as distinct from previous contributions, is the creation of two new semantic publishing and referencing ontologies of sufficient expressivity to meet the requirements of end users such as academic authors and publishers.
We have also developed two new presentation technologies, the Live OWL Documentation Environment (LODE)4 and the Graphical Framework For OWL
Related works
While wanting to re-use existing information models and vocabularies of relevance to scholarly publishing as far as possible, we had to come to terms with their limitations. In this section, we briefly introduce and comment upon these well-known vocabularies that we have considered and/or fully or partially employed within our own work.
Dublin core. Born as consequence of a conference held in Dublin, Ohio, USA in 1995 that involved both technicians (librarians, publishers, archivists) and
Representing bibliographic information using FaBiO
The need for ontologies that are sufficiently expressive for describing documents has been presented in Section 1.
The vocabularies described above are either poor in concepts or are ‘flat’, preventing their use for accurately describing the complexity of publishing reality. We will illustrate this by considering the representation of a typical bibliographic reference using first BIBO and then FRBR. We will then show how this information can be more accurately described using FaBiO. Consider the
Characterising citations with CiTO
Bibliographic citation, i.e., the act of referring from a citing entity to the cited one, is one of the most important activities of an author in the production of any bibliographic work, since the acknowledgement of sources that this activity represents stands at the very core of the scholarly enterprise. The network of citations created by combining citation information from many academic articles and books is a source of rich information for scholars, and can be used by a publisher to create
Community uptake of FaBiO and CiTO
FaBiO and CiTO are now being used or are being considered for adoption in a variety of academic and publishing environments, as described below. The adoption of these models by different communities can be ascribed, at least in part, to the minimization of the constraints applied to the ontological entities, so that the ontologies can be applied in a wide variety of situations.
Linked Open Vocabularies Dataset. The Linked Open Vocabularies Dataset (LOV),28
Conclusions
In this article we have presented FaBiO, the FRBR-aligned Bibliographic Ontology (current version 1.6), and CiTO, the Citation Typing Ontology (current version 2.3). FaBiO and CiTO are two new OWL 2 DL ontologies for describing bibliographic resources and bibliographic citations on the Semantic Web. We have introduced those two models step by step, in order to emphasise their features for describing bibliographic objects and to stress their advantages relative to other pre-existing models.
Acknowledgments
We thank Paolo Ciccarese and Tim Clark of Harvard University for their support and valuable conceptual and technical suggestions given while we were developing FaBiO and CiTO from CiTO v1.6 and were collaborating with them to harmonise these with SWAN. We are also grateful to our local colleagues, particularly Jun Zhao, Graham Klyne and Fabio Vitali, for their warm support, constructive criticisms and help given throughout these developments. Aspects of this work have been supported by the JISC
References (29)
- et al.
The SWAN biomedical discourse ontology
Journal of Biomedical Informatics
(2008) - J. Carroll, G. Klyne, Resource description framework (RDF): concepts and abstract syntax, W3C Recommendation, 10...
- W3C OWL Working Group, OWL 2 web ontology language document overview, W3C Recommendation, 27 October 2009, World Wide...
Semantic publishing: the coming revolution in scientific journal publishing
Learned Publishing
(2009)- et al.
Adventures in semantic publishing: exemplar semantic enhancements of a research article
PLoS Computational Biology
(2009) - S. Peroni, D. Shotton, F. Vitali, The Live OWL Documentation Environment: a tool for the automatic generation of...
CiTO, the citation typing ontology
Journal of Biomedical Semantics
(2010)- A. Rector, Modularisation of domain ontologies implemented in description logics and related formalisms including OWL,...
- D. Shotton, C. Caton, G. Klyne, Ontologies for sharing, ontologies for use, In The Ontogenesis Knowledge Blog, 2010....
- Dublin Core Metadata Initiative, Dublin core metadata element set, Version 1.1, DCMI Recommendation, 2010....