Skip to main navigation Skip to search Skip to main content

Large-scale automated machine reading discovers new cancer-driving mechanisms

  • Marco A. Valenzuela-Escárcega
  • , Özgün Babur
  • , Gus Hahn-Powell
  • , Dane Bell
  • , Thomas Hicks
  • , Enrique Noriega-Atala
  • , Xia Wang
  • , Mihai Surdeanu
  • , Emek Demir
  • , Clayton T. Morrison

Research output: Contribution to journalArticlepeer-review

Abstract

PubMed, a repository and search engine for biomedical literature, now indexes >1 million articles each year. This exceeds the processing capacity of human domain experts, limiting our ability to truly understand many diseases. We present Reach, a system for automated, large-scale machine reading of biomedical papers that can extract mechanistic descriptions of biological processes with relatively high precision at high throughput. We demonstrate that combining the extracted pathway fragments with existing biological data analysis algorithms that rely on curated models helps identify and explain a large number of previously unidentified mutually exclusive altered signaling pathways in seven different cancer types. This work shows that combining human-curated 'big mechanisms' with extracted 'big data' can lead to a causal, predictive understanding of cellular processes and unlock important downstream applications.

Original languageEnglish (US)
JournalDatabase
Volume2018
Issue number2018
DOIs
StatePublished - Jan 1 2018

Funding

We thank MITRE for defining and implementing the evaluation described in Section 3.1. We are especially grateful to Tonia Korves and Lynette Hirschman for making these results available before their publication and for the many clarification discussions. We also thank the anonymous reviewers for their insightful comments. Defense Advanced Research Projects Agency (DARPA) Big Mechanism program [ARO W911NF-14-1-0395].

FundersFunder number
Author National Science Foundation National Science Foundation National Institutes of Health National Institutes of Health National Institutes of Health National Institutes of Health National Science Foundation National Science Foundation1740858
Defense Advanced Research Projects AgencyARO W911NF-14-1-0395

    ASJC Scopus subject areas

    • Information Systems
    • General Biochemistry, Genetics and Molecular Biology
    • General Agricultural and Biological Sciences

    Fingerprint

    Dive into the research topics of 'Large-scale automated machine reading discovers new cancer-driving mechanisms'. Together they form a unique fingerprint.

    Cite this