Big Data from Pharmaceutical Patents: A Computational Analysis of Medicinal Chemists' Bread and Butter

J Med Chem. 2016 May 12;59(9):4385-402. doi: 10.1021/acs.jmedchem.6b00153. Epub 2016 Apr 8.

Abstract

Multiple recent studies have focused on unraveling the content of the medicinal chemist's toolbox. Here, we present an investigation of chemical reactions and molecules retrieved from U.S. patents over the past 40 years (1976-2015). We used a sophisticated text-mining pipeline to extract 1.15 million unique whole reaction schemes, including reaction roles and yields, from pharmaceutical patents. The reactions were assigned to well-known reaction types such as Wittig olefination or Buchwald-Hartwig amination using an expert system. Analyzing the evolution of reaction types over time, we observe the previously reported bias toward reaction classes like amide bond formations or Suzuki couplings. Our study also shows a steady increase in the number of different reaction types used in pharmaceutical patents but a trend toward lower median yield for some of the reaction classes. Finally, we found that today's typical product molecule is larger, more hydrophobic, and more rigid than 40 years ago.

Publication types

  • Historical Article

MeSH terms

  • Chemistry, Pharmaceutical*
  • Drug Industry*
  • History, 20th Century
  • History, 21st Century
  • Patents as Topic*
  • Workforce