Automatically pre-screening patients for the rare disease aromatic L-amino acid decarboxylase deficiency using knowledge engineering, natural language processing, and machine learning on a large EHR population

Aaron M. Cohen, Jolie Kaner, Ryan Miller, Jeffrey W. Kopesky, William Hersh

Research output: Contribution to journalArticlepeer-review

Abstract

Objectives: Electronic health record (EHR) data may facilitate the identification of rare diseases in patients, such as aromatic L-amino acid decarboxylase deficiency (AADCd), an autosomal recessive disease caused by pathogenic variants in the dopa decarboxylase gene. Deficiency of the AADC enzyme results in combined severe reductions in monoamine neurotransmitters: dopamine, serotonin, epinephrine, and norepinephrine. This leads to widespread neurological complications affecting motor, behavioral, and autonomic function. The goal of this study was to use EHR data to identify previously undiagnosed patients who may have AADCd without available training cases for the disease. Materials and Methods: A multiple symptom and related disease annotated dataset was created and used to train individual concept classifiers on annotated sentence data. A multistep algorithm was then used to combine concept predictions into a single patient rank value. Results: Using an 8000-patient dataset that the algorithms had not seen before ranking, the top and bottom 200 ranked patients were manually reviewed for clinical indications of performing an AADCd diagnostic screening test. The top-ranked patients were 22.5% positively assessed for diagnostic screening, with 0% for the bottom-ranked patients. This result is statistically significant at P < .0001. Conclusion: This work validates the approach that large-scale rare-disease screening can be accomplished by combining predictions for relevant individual symptoms and related conditions which are much more common and for which training data is easier to create.

Original languageEnglish (US)
Pages (from-to)692-704
Number of pages13
JournalJournal of the American Medical Informatics Association
Volume31
Issue number3
DOIs
StatePublished - Mar 1 2024

Keywords

  • EHR data secondary uses
  • aromatic L-amino acid decarboxylase deficiency
  • machine learning
  • natural language processing
  • rare diseases

ASJC Scopus subject areas

  • Health Informatics

Fingerprint

Dive into the research topics of 'Automatically pre-screening patients for the rare disease aromatic L-amino acid decarboxylase deficiency using knowledge engineering, natural language processing, and machine learning on a large EHR population'. Together they form a unique fingerprint.

Cite this