TY - JOUR
T1 - Disaggregating Latino nativity in equity research using electronic health records
AU - Marino, Miguel
AU - Fankhauser, Katie
AU - Minnier, Jessica
AU - Lucas, Jennifer A.
AU - Giebultowicz, Sophia
AU - Kaufmann, Jorge
AU - Hwang, Jun
AU - Bailey, Steffani R.
AU - Crookes, Danielle M.
AU - Bazemore, Andrew
AU - Suglia, Shakira F.
AU - Heintzman, John
N1 - Publisher Copyright:
© 2023 Health Research and Educational Trust.
PY - 2023/10
Y1 - 2023/10
N2 - Objective: To develop and validate prediction models for inference of Latino nativity to advance health equity research. Data Sources/Study Setting: This study used electronic health records (EHRs) from 19,985 Latino children with self-reported country of birth seeking care from January 1, 2012 to December 31, 2018 at 456 community health centers (CHCs) across 15 states along with census-tract geocoded neighborhood composition and surname data. Study Design: We constructed and evaluated the performance of prediction models within a broad machine learning framework (Super Learner) for the estimation of Latino nativity. Outcomes included binary indicators denoting nativity (US vs. foreign-born) and Latino country of birth (Mexican, Cuban, Guatemalan). The performance of these models was compared using the area under the receiver operating characteristics curve (AUC) from an externally withheld patient sample. Data Collection/Extraction Methods: Census surname lists, census neighborhood composition, and Forebears administrative data were linked to EHR data. Principal Findings: Of the 19,985 Latino patients, 10.7% reported a non-US country of birth (5.1% Mexican, 4.7% Guatemalan, 0.8% Cuban). Overall, prediction models for nativity showed outstanding performance with external validation (US-born vs. foreign: AUC = 0.90; Mexican vs. non-Mexican: AUC = 0.89; Guatemalan vs. non-Guatemalan: AUC = 0.95; Cuban vs. non-Cuban: AUC = 0.99). Conclusions: Among challenges facing health equity researchers in health services is the absence of methods for data disaggregation, and the specific ability to determine Latino country of birth (nativity) to inform disparities. Recent interest in more robust health equity research has called attention to the importance of data disaggregation. In a multistate network of CHCs using multilevel inputs from EHR data linked to surname and community data, we developed and validated novel prediction models for the use of available EHR data to infer Latino nativity for health disparities research in primary care and health services research, which is a significant potential methodologic advance in studying this population.
AB - Objective: To develop and validate prediction models for inference of Latino nativity to advance health equity research. Data Sources/Study Setting: This study used electronic health records (EHRs) from 19,985 Latino children with self-reported country of birth seeking care from January 1, 2012 to December 31, 2018 at 456 community health centers (CHCs) across 15 states along with census-tract geocoded neighborhood composition and surname data. Study Design: We constructed and evaluated the performance of prediction models within a broad machine learning framework (Super Learner) for the estimation of Latino nativity. Outcomes included binary indicators denoting nativity (US vs. foreign-born) and Latino country of birth (Mexican, Cuban, Guatemalan). The performance of these models was compared using the area under the receiver operating characteristics curve (AUC) from an externally withheld patient sample. Data Collection/Extraction Methods: Census surname lists, census neighborhood composition, and Forebears administrative data were linked to EHR data. Principal Findings: Of the 19,985 Latino patients, 10.7% reported a non-US country of birth (5.1% Mexican, 4.7% Guatemalan, 0.8% Cuban). Overall, prediction models for nativity showed outstanding performance with external validation (US-born vs. foreign: AUC = 0.90; Mexican vs. non-Mexican: AUC = 0.89; Guatemalan vs. non-Guatemalan: AUC = 0.95; Cuban vs. non-Cuban: AUC = 0.99). Conclusions: Among challenges facing health equity researchers in health services is the absence of methods for data disaggregation, and the specific ability to determine Latino country of birth (nativity) to inform disparities. Recent interest in more robust health equity research has called attention to the importance of data disaggregation. In a multistate network of CHCs using multilevel inputs from EHR data linked to surname and community data, we developed and validated novel prediction models for the use of available EHR data to infer Latino nativity for health disparities research in primary care and health services research, which is a significant potential methodologic advance in studying this population.
KW - U.S. Census location
KW - ethnicity
KW - health disparities
KW - machine learning
KW - surname data
UR - http://www.scopus.com/inward/record.url?scp=85152058944&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85152058944&partnerID=8YFLogxK
U2 - 10.1111/1475-6773.14154
DO - 10.1111/1475-6773.14154
M3 - Article
C2 - 36978286
AN - SCOPUS:85152058944
SN - 0017-9124
VL - 58
SP - 1119
EP - 1130
JO - Health Services Research
JF - Health Services Research
IS - 5
ER -