A Framework for Data Quality Assessment in Clinical Research Datasets

Kathleen Lee, Nicole Weiskopf, Jyotishman Pathak

Research output: Contribution to journalArticlepeer-review

27 Scopus citations


The wide availability of electronic health record (EHR) data for multi-institutional clinical research relies on accurately defined patient cohorts to ensure validity, especially when used in conjunction with open-access research data. There is a growing need to utilize a consensus-driven approach to assess data quality. To achieve this goal, we modified an existing data quality assessment (DQA) framework by re-operationalizing dimensions of quality for a clinical domain of interest - heart failure. We then created an inventory of common phenotype data elements (CPDEs) derived from open-access datasets and evaluated it against the modified DQA framework. We measured our inventory of CPDEs for Conformance, Completeness, and Plausibility. DQA scores were high on Completeness, Value Conformance, and Atemporal and Temporal Plausibility. Our work exhibits a generalizable approach to DQA for clinical research. Future work will 1) map datasets to standard terminologies and 2) create a quantitative DQA tool for research datasets.

Original languageEnglish (US)
Pages (from-to)1080-1089
Number of pages10
JournalAMIA ... Annual Symposium proceedings. AMIA Symposium
StatePublished - 2017

ASJC Scopus subject areas

  • Medicine(all)


Dive into the research topics of 'A Framework for Data Quality Assessment in Clinical Research Datasets'. Together they form a unique fingerprint.

Cite this