Assembling the Community-Scale Discoverable Human Proteome

Cell Syst. 2018 Oct 24;7(4):412-421.e5. doi: 10.1016/j.cels.2018.08.004. Epub 2018 Aug 29.

Abstract

The increasing throughput and sharing of proteomics mass spectrometry data have now yielded over one-third of a million public mass spectrometry runs. However, these discoveries are not continuously aggregated in an open and error-controlled manner, which limits their utility. To facilitate the reusability of these data, we built the MassIVE Knowledge Base (MassIVE-KB), a community-wide, continuously updating knowledge base that aggregates proteomics mass spectrometry discoveries into an open reusable format with full provenance information for community scrutiny. Reusing >31 TB of public human data stored in a mass spectrometry interactive virtual environment (MassIVE), the MassIVE-KB contains >2.1 million precursors from 19,610 proteins (48% larger than before; 97% of the total) and doubles proteome coverage to 6 million amino acids (54% of the proteome) with strict library-scale false discovery controls, thereby providing evidence for 430 proteins for which sufficient protein-level evidence was previously missing. Furthermore, MassIVE-KB can inform experimental design, helps identify and quantify new data, and provides tools for community construction of specialized spectral libraries.

Keywords: algorithms; big data; knowledge base; proteomics; repositories; spectral libraries; tandem mass spectrometry.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Biological Variation, Population
  • Databases, Protein
  • Humans
  • Mass Spectrometry / methods*
  • Proteome / chemistry*
  • Proteome / genetics
  • Proteomics / methods*

Substances

  • Proteome