CATH: expanding the horizons of structure-based functional annotations for genome sequences

Nucleic Acids Res. 2019 Jan 8;47(D1):D280-D284. doi: 10.1093/nar/gky1097.

Abstract

This article provides an update of the latest data and developments within the CATH protein structure classification database (http://www.cathdb.info). The resource provides two levels of release: CATH-B, a daily snapshot of the latest structural domain boundaries and superfamily assignments, and CATH+, which adds layers of derived data, such as predicted sequence domains, functional annotations and functional clustering (known as Functional Families or FunFams). The most recent CATH+ release (version 4.2) provides a huge update in the coverage of structural data. This release increases the number of fully- classified domains by over 40% (from 308 999 to 434 857 structural domains), corresponding to an almost two- fold increase in sequence data (from 53 million to over 95 million predicted domains) organised into 6119 superfamilies. The coverage of high-resolution, protein PDB chains that contain at least one assigned CATH domain is now 90.2% (increased from 82.3% in the previous release). A number of highly requested features have also been implemented in our web pages: allowing the user to view an alignment between their query sequence and a representative FunFam structure and providing tools that make it easier to view the full structural context (multi-domain architecture) of domains and chains.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acid Sequence
  • Animals
  • Conserved Sequence
  • Databases, Protein*
  • Gene Ontology
  • Genome*
  • Humans
  • Models, Molecular
  • Molecular Sequence Annotation
  • Multigene Family / genetics
  • Protein Conformation
  • Protein Domains / genetics
  • Sequence Alignment
  • Sequence Homology, Amino Acid
  • Structure-Activity Relationship