TY - JOUR
T1 - Automatically pre-screening patients for the rare disease aromatic L-amino acid decarboxylase deficiency using knowledge engineering, natural language processing, and machine learning on a large EHR population
AU - Cohen, Aaron M.
AU - Kaner, Jolie
AU - Miller, Ryan
AU - Kopesky, Jeffrey W.
AU - Hersh, William
N1 - Publisher Copyright:
© The Author(s) 2023.
PY - 2024/3/1
Y1 - 2024/3/1
N2 - Objectives: Electronic health record (EHR) data may facilitate the identification of rare diseases in patients, such as aromatic L-amino acid decarboxylase deficiency (AADCd), an autosomal recessive disease caused by pathogenic variants in the dopa decarboxylase gene. Deficiency of the AADC enzyme results in combined severe reductions in monoamine neurotransmitters: dopamine, serotonin, epinephrine, and norepinephrine. This leads to widespread neurological complications affecting motor, behavioral, and autonomic function. The goal of this study was to use EHR data to identify previously undiagnosed patients who may have AADCd without available training cases for the disease. Materials and Methods: A multiple symptom and related disease annotated dataset was created and used to train individual concept classifiers on annotated sentence data. A multistep algorithm was then used to combine concept predictions into a single patient rank value. Results: Using an 8000-patient dataset that the algorithms had not seen before ranking, the top and bottom 200 ranked patients were manually reviewed for clinical indications of performing an AADCd diagnostic screening test. The top-ranked patients were 22.5% positively assessed for diagnostic screening, with 0% for the bottom-ranked patients. This result is statistically significant at P < .0001. Conclusion: This work validates the approach that large-scale rare-disease screening can be accomplished by combining predictions for relevant individual symptoms and related conditions which are much more common and for which training data is easier to create.
AB - Objectives: Electronic health record (EHR) data may facilitate the identification of rare diseases in patients, such as aromatic L-amino acid decarboxylase deficiency (AADCd), an autosomal recessive disease caused by pathogenic variants in the dopa decarboxylase gene. Deficiency of the AADC enzyme results in combined severe reductions in monoamine neurotransmitters: dopamine, serotonin, epinephrine, and norepinephrine. This leads to widespread neurological complications affecting motor, behavioral, and autonomic function. The goal of this study was to use EHR data to identify previously undiagnosed patients who may have AADCd without available training cases for the disease. Materials and Methods: A multiple symptom and related disease annotated dataset was created and used to train individual concept classifiers on annotated sentence data. A multistep algorithm was then used to combine concept predictions into a single patient rank value. Results: Using an 8000-patient dataset that the algorithms had not seen before ranking, the top and bottom 200 ranked patients were manually reviewed for clinical indications of performing an AADCd diagnostic screening test. The top-ranked patients were 22.5% positively assessed for diagnostic screening, with 0% for the bottom-ranked patients. This result is statistically significant at P < .0001. Conclusion: This work validates the approach that large-scale rare-disease screening can be accomplished by combining predictions for relevant individual symptoms and related conditions which are much more common and for which training data is easier to create.
KW - EHR data secondary uses
KW - aromatic L-amino acid decarboxylase deficiency
KW - machine learning
KW - natural language processing
KW - rare diseases
UR - http://www.scopus.com/inward/record.url?scp=85185346418&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85185346418&partnerID=8YFLogxK
U2 - 10.1093/jamia/ocad244
DO - 10.1093/jamia/ocad244
M3 - Article
C2 - 38134953
AN - SCOPUS:85185346418
SN - 1067-5027
VL - 31
SP - 692
EP - 704
JO - Journal of the American Medical Informatics Association
JF - Journal of the American Medical Informatics Association
IS - 3
ER -