TY - JOUR
T1 - Development of an Open-Source Annotated Glaucoma Medication Dataset From Clinical Notes in the Electronic Health Record
AU - Chen, Jimmy S.
AU - Lin, Wei Chun
AU - Yang, Sen
AU - Chiang, Michael F.
AU - Hribar, Michelle R.
N1 - Publisher Copyright:
© 2022 The Authors.
PY - 2022/11
Y1 - 2022/11
N2 - Purpose: To describe the methods involved in processing and characteristics of an open dataset of annotated clinical notes from the electronic health record (EHR) annotated for glaucoma medications. Methods: In this study, 480 clinical notes from office visits, medical record numbers (MRNs), visit identification numbers, provider names, and billing codes were extracted for 480 patients seen for glaucoma by a comprehensive or glaucoma ophthalmologist from January 1, 2019, to August 31, 2020. MRNs and all visit data were de-identified using a hash function with salt from the deidentifyr package. All progress notes were annotated for glaucoma medication name, route, frequency, dosage, and drug use using an open-source annotation tool, Doccano. Annotations were saved separately. All protected health information (PHI) in progress notes and annotated files were deidentified using the published de-identifying algorithm Philter. All progress notes and annotations were manually validated by two ophthalmologists to ensure complete deidentification. Results: The final dataset contained 5520 annotated sentences, including those with and without medications, for 480 clinical notes. Manual validation revealed 10 instances of remaining PHI which were manually corrected. Conclusions: Annotated free-text clinical notes can be de-identified for upload as an open dataset. As data availability increases with the adoption of EHRs, free-text open datasets will become increasingly valuable for “big data” research and artificial intelligence development. This dataset is published online and publicly available at https: //github.com/jche253/Glaucoma_Med_Dataset. Translational Relevance: This open access medication dataset may be a source of raw data for future research involving big data and artificial intelligence research using freetext.
AB - Purpose: To describe the methods involved in processing and characteristics of an open dataset of annotated clinical notes from the electronic health record (EHR) annotated for glaucoma medications. Methods: In this study, 480 clinical notes from office visits, medical record numbers (MRNs), visit identification numbers, provider names, and billing codes were extracted for 480 patients seen for glaucoma by a comprehensive or glaucoma ophthalmologist from January 1, 2019, to August 31, 2020. MRNs and all visit data were de-identified using a hash function with salt from the deidentifyr package. All progress notes were annotated for glaucoma medication name, route, frequency, dosage, and drug use using an open-source annotation tool, Doccano. Annotations were saved separately. All protected health information (PHI) in progress notes and annotated files were deidentified using the published de-identifying algorithm Philter. All progress notes and annotations were manually validated by two ophthalmologists to ensure complete deidentification. Results: The final dataset contained 5520 annotated sentences, including those with and without medications, for 480 clinical notes. Manual validation revealed 10 instances of remaining PHI which were manually corrected. Conclusions: Annotated free-text clinical notes can be de-identified for upload as an open dataset. As data availability increases with the adoption of EHRs, free-text open datasets will become increasingly valuable for “big data” research and artificial intelligence development. This dataset is published online and publicly available at https: //github.com/jche253/Glaucoma_Med_Dataset. Translational Relevance: This open access medication dataset may be a source of raw data for future research involving big data and artificial intelligence research using freetext.
KW - artificial intelligence
KW - big data
KW - electronic health records
KW - glaucoma
KW - medications
UR - http://www.scopus.com/inward/record.url?scp=85142939647&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85142939647&partnerID=8YFLogxK
U2 - 10.1167/TVST.11.11.20
DO - 10.1167/TVST.11.11.20
M3 - Article
C2 - 36441131
AN - SCOPUS:85142939647
SN - 2164-2591
VL - 11
JO - Translational Vision Science and Technology
JF - Translational Vision Science and Technology
IS - 11
M1 - 20
ER -