The Candida Genome Database (CGD): incorporation of Assembly 22, systematic identifiers and visualization of high throughput sequencing data

Nucleic Acids Res. 2017 Jan 4;45(D1):D592-D596. doi: 10.1093/nar/gkw924. Epub 2016 Oct 13.

Abstract

The Candida Genome Database (CGD, http://www.candidagenome.org/) is a freely available online resource that provides gene, protein and sequence information for multiple Candida species, along with web-based tools for accessing, analyzing and exploring these data. The mission of CGD is to facilitate and accelerate research into Candida pathogenesis and biology, by curating the scientific literature in real time, and connecting literature-derived annotations to the latest version of the genomic sequence and its annotations. Here, we report the incorporation into CGD of Assembly 22, the first chromosome-level, phased diploid assembly of the C. albicans genome, coupled with improvements that we have made to the assembly using additional available sequence data. We also report the creation of systematic identifiers for C. albicans genes and sequence features using a system similar to that adopted by the yeast community over two decades ago. Finally, we describe the incorporation of JBrowse into CGD, which allows online browsing of mapped high throughput sequencing data, and its implementation for several RNA-Seq data sets, as well as the whole genome sequencing data that was used in the construction of Assembly 22.

MeSH terms

  • Candida / genetics*
  • Computational Biology / methods*
  • Databases, Nucleic Acid*
  • Fungal Proteins / chemistry
  • Fungal Proteins / genetics
  • Genome, Fungal*
  • Genomics / methods
  • High-Throughput Nucleotide Sequencing
  • Molecular Sequence Annotation
  • Open Reading Frames
  • Software*
  • Web Browser

Substances

  • Fungal Proteins