TY - JOUR
T1 - An imbalance-aware deep neural network for early prediction of preeclampsia
AU - Bennett, Rachel
AU - Mulla, Zuber D.
AU - Parikh, Pavan
AU - Hauspurg, Alisse
AU - Razzaghi, Talayeh
N1 - Publisher Copyright:
© 2022 Bennett et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
PY - 2022/4
Y1 - 2022/4
N2 - Preeclampsia (PE) is a hypertensive complication affecting 8-10% of US pregnancies annually. While there is no cure for PE, aspirin may reduce complications for those at high risk for PE. Furthermore, PE disproportionately affects racial minorities, with a higher burden of morbidity and mortality. Previous studies have shown early prediction of PE would allow for prevention. We approached the prediction of PE using a new method based on a cost-sensitive deep neural network (CSDNN) by considering the severe imbalance and sparse nature of the data, as well as racial disparities. We validated our model using large extant rich data sources that represent a diverse cohort of minority populations in the US. These include Texas Public Use Data Files (PUDF), Oklahoma PUDF, and the Magee Obstetric Medical and Infant (MOMI) databases. We identified the most influential clinical and demographic features (predictor variables) relevant to PE for both general populations and smaller racial groups. We also investigated the effectiveness of multiple network architectures using three hyperparameter optimization algorithms: Bayesian optimization, Hyperband, and random search. Our proposed models equipped with focal loss function yield superior and reliable prediction performance compared with the state-of-the-art techniques with an average area under the curve (AUC) of 66.3% and 63.5% for the Texas and Oklahoma PUDF respectively, while the CSDNN model with weighted cross-entropy loss function outperforms with an AUC of 76.5% for the MOMI data. Furthermore, our CSDNN model equipped with focal loss function leads to an AUC of 66.7% for Texas African American and 57.1% for Native American. The best results are obtained with 62.3% AUC with CSDNN with weighted cross-entropy loss function for Oklahoma African American, 58% AUC with DNN and balanced batch for Oklahoma Native American, and 72.4% AUC using either CSDNN with weighted cross-entropy loss function or CSDNN with focal loss with balanced batch method for MOMI African American dataset. Our results provide the first evidence of the predictive power of clinical databases for PE prediction among minority populations.
AB - Preeclampsia (PE) is a hypertensive complication affecting 8-10% of US pregnancies annually. While there is no cure for PE, aspirin may reduce complications for those at high risk for PE. Furthermore, PE disproportionately affects racial minorities, with a higher burden of morbidity and mortality. Previous studies have shown early prediction of PE would allow for prevention. We approached the prediction of PE using a new method based on a cost-sensitive deep neural network (CSDNN) by considering the severe imbalance and sparse nature of the data, as well as racial disparities. We validated our model using large extant rich data sources that represent a diverse cohort of minority populations in the US. These include Texas Public Use Data Files (PUDF), Oklahoma PUDF, and the Magee Obstetric Medical and Infant (MOMI) databases. We identified the most influential clinical and demographic features (predictor variables) relevant to PE for both general populations and smaller racial groups. We also investigated the effectiveness of multiple network architectures using three hyperparameter optimization algorithms: Bayesian optimization, Hyperband, and random search. Our proposed models equipped with focal loss function yield superior and reliable prediction performance compared with the state-of-the-art techniques with an average area under the curve (AUC) of 66.3% and 63.5% for the Texas and Oklahoma PUDF respectively, while the CSDNN model with weighted cross-entropy loss function outperforms with an AUC of 76.5% for the MOMI data. Furthermore, our CSDNN model equipped with focal loss function leads to an AUC of 66.7% for Texas African American and 57.1% for Native American. The best results are obtained with 62.3% AUC with CSDNN with weighted cross-entropy loss function for Oklahoma African American, 58% AUC with DNN and balanced batch for Oklahoma Native American, and 72.4% AUC using either CSDNN with weighted cross-entropy loss function or CSDNN with focal loss with balanced batch method for MOMI African American dataset. Our results provide the first evidence of the predictive power of clinical databases for PE prediction among minority populations.
UR - http://www.scopus.com/inward/record.url?scp=85127717572&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85127717572&partnerID=8YFLogxK
U2 - 10.1371/journal.pone.0266042
DO - 10.1371/journal.pone.0266042
M3 - Article
C2 - 35385525
AN - SCOPUS:85127717572
SN - 1932-6203
VL - 17
JO - PloS one
JF - PloS one
IS - 4 April
M1 - e0266042
ER -