Disaggregating Latino nativity in equity research using electronic health records

Miguel Marino; Katie Fankhauser; Jessica Minnier; Jennifer A. Lucas; Sophia Giebultowicz; Jorge Kaufmann; Jun Hwang; Steffani R. Bailey; Danielle M. Crookes; Andrew Bazemore; Shakira F. Suglia; John Heintzman

doi:10.1111/1475-6773.14154

Disaggregating Latino nativity in equity research using electronic health records

Miguel Marino, Katie Fankhauser, Jessica Minnier, Jennifer A. Lucas, Sophia Giebultowicz, Jorge Kaufmann, Jun Hwang, Steffani R. Bailey, Danielle M. Crookes, Andrew Bazemore, Shakira F. Suglia, John Heintzman

Family Medicine

Research output: Contribution to journal › Article › peer-review

Abstract

Objective: To develop and validate prediction models for inference of Latino nativity to advance health equity research. Data Sources/Study Setting: This study used electronic health records (EHRs) from 19,985 Latino children with self-reported country of birth seeking care from January 1, 2012 to December 31, 2018 at 456 community health centers (CHCs) across 15 states along with census-tract geocoded neighborhood composition and surname data. Study Design: We constructed and evaluated the performance of prediction models within a broad machine learning framework (Super Learner) for the estimation of Latino nativity. Outcomes included binary indicators denoting nativity (US vs. foreign-born) and Latino country of birth (Mexican, Cuban, Guatemalan). The performance of these models was compared using the area under the receiver operating characteristics curve (AUC) from an externally withheld patient sample. Data Collection/Extraction Methods: Census surname lists, census neighborhood composition, and Forebears administrative data were linked to EHR data. Principal Findings: Of the 19,985 Latino patients, 10.7% reported a non-US country of birth (5.1% Mexican, 4.7% Guatemalan, 0.8% Cuban). Overall, prediction models for nativity showed outstanding performance with external validation (US-born vs. foreign: AUC = 0.90; Mexican vs. non-Mexican: AUC = 0.89; Guatemalan vs. non-Guatemalan: AUC = 0.95; Cuban vs. non-Cuban: AUC = 0.99). Conclusions: Among challenges facing health equity researchers in health services is the absence of methods for data disaggregation, and the specific ability to determine Latino country of birth (nativity) to inform disparities. Recent interest in more robust health equity research has called attention to the importance of data disaggregation. In a multistate network of CHCs using multilevel inputs from EHR data linked to surname and community data, we developed and validated novel prediction models for the use of available EHR data to infer Latino nativity for health disparities research in primary care and health services research, which is a significant potential methodologic advance in studying this population.

Original language	English (US)
Pages (from-to)	1119-1130
Number of pages	12
Journal	Health Services Research
Volume	58
Issue number	5
DOIs	https://doi.org/10.1111/1475-6773.14154
State	Published - Oct 2023

Keywords

U.S. Census location
ethnicity
health disparities
machine learning
surname data

ASJC Scopus subject areas

Health Policy

Access to Document

10.1111/1475-6773.14154

Cite this

@article{a98a0f5c9ce645efa81e12ac5c861776,

title = "Disaggregating Latino nativity in equity research using electronic health records",

abstract = "Objective: To develop and validate prediction models for inference of Latino nativity to advance health equity research. Data Sources/Study Setting: This study used electronic health records (EHRs) from 19,985 Latino children with self-reported country of birth seeking care from January 1, 2012 to December 31, 2018 at 456 community health centers (CHCs) across 15 states along with census-tract geocoded neighborhood composition and surname data. Study Design: We constructed and evaluated the performance of prediction models within a broad machine learning framework (Super Learner) for the estimation of Latino nativity. Outcomes included binary indicators denoting nativity (US vs. foreign-born) and Latino country of birth (Mexican, Cuban, Guatemalan). The performance of these models was compared using the area under the receiver operating characteristics curve (AUC) from an externally withheld patient sample. Data Collection/Extraction Methods: Census surname lists, census neighborhood composition, and Forebears administrative data were linked to EHR data. Principal Findings: Of the 19,985 Latino patients, 10.7% reported a non-US country of birth (5.1% Mexican, 4.7% Guatemalan, 0.8% Cuban). Overall, prediction models for nativity showed outstanding performance with external validation (US-born vs. foreign: AUC = 0.90; Mexican vs. non-Mexican: AUC = 0.89; Guatemalan vs. non-Guatemalan: AUC = 0.95; Cuban vs. non-Cuban: AUC = 0.99). Conclusions: Among challenges facing health equity researchers in health services is the absence of methods for data disaggregation, and the specific ability to determine Latino country of birth (nativity) to inform disparities. Recent interest in more robust health equity research has called attention to the importance of data disaggregation. In a multistate network of CHCs using multilevel inputs from EHR data linked to surname and community data, we developed and validated novel prediction models for the use of available EHR data to infer Latino nativity for health disparities research in primary care and health services research, which is a significant potential methodologic advance in studying this population.",

keywords = "U.S. Census location, ethnicity, health disparities, machine learning, surname data",

author = "Miguel Marino and Katie Fankhauser and Jessica Minnier and Lucas, {Jennifer A.} and Sophia Giebultowicz and Jorge Kaufmann and Jun Hwang and Bailey, {Steffani R.} and Crookes, {Danielle M.} and Andrew Bazemore and Suglia, {Shakira F.} and John Heintzman",

note = "Publisher Copyright: {\textcopyright} 2023 Health Research and Educational Trust.",

year = "2023",

month = oct,

doi = "10.1111/1475-6773.14154",

language = "English (US)",

volume = "58",

pages = "1119--1130",

journal = "Health Services Research",

issn = "0017-9124",

publisher = "Wiley-Blackwell",

number = "5",

}

TY - JOUR

T1 - Disaggregating Latino nativity in equity research using electronic health records

AU - Marino, Miguel

AU - Fankhauser, Katie

AU - Minnier, Jessica

AU - Lucas, Jennifer A.

AU - Giebultowicz, Sophia

AU - Kaufmann, Jorge

AU - Hwang, Jun

AU - Bailey, Steffani R.

AU - Crookes, Danielle M.

AU - Bazemore, Andrew

AU - Suglia, Shakira F.

AU - Heintzman, John

PY - 2023/10

Y1 - 2023/10

N2 - Objective: To develop and validate prediction models for inference of Latino nativity to advance health equity research. Data Sources/Study Setting: This study used electronic health records (EHRs) from 19,985 Latino children with self-reported country of birth seeking care from January 1, 2012 to December 31, 2018 at 456 community health centers (CHCs) across 15 states along with census-tract geocoded neighborhood composition and surname data. Study Design: We constructed and evaluated the performance of prediction models within a broad machine learning framework (Super Learner) for the estimation of Latino nativity. Outcomes included binary indicators denoting nativity (US vs. foreign-born) and Latino country of birth (Mexican, Cuban, Guatemalan). The performance of these models was compared using the area under the receiver operating characteristics curve (AUC) from an externally withheld patient sample. Data Collection/Extraction Methods: Census surname lists, census neighborhood composition, and Forebears administrative data were linked to EHR data. Principal Findings: Of the 19,985 Latino patients, 10.7% reported a non-US country of birth (5.1% Mexican, 4.7% Guatemalan, 0.8% Cuban). Overall, prediction models for nativity showed outstanding performance with external validation (US-born vs. foreign: AUC = 0.90; Mexican vs. non-Mexican: AUC = 0.89; Guatemalan vs. non-Guatemalan: AUC = 0.95; Cuban vs. non-Cuban: AUC = 0.99). Conclusions: Among challenges facing health equity researchers in health services is the absence of methods for data disaggregation, and the specific ability to determine Latino country of birth (nativity) to inform disparities. Recent interest in more robust health equity research has called attention to the importance of data disaggregation. In a multistate network of CHCs using multilevel inputs from EHR data linked to surname and community data, we developed and validated novel prediction models for the use of available EHR data to infer Latino nativity for health disparities research in primary care and health services research, which is a significant potential methodologic advance in studying this population.

AB - Objective: To develop and validate prediction models for inference of Latino nativity to advance health equity research. Data Sources/Study Setting: This study used electronic health records (EHRs) from 19,985 Latino children with self-reported country of birth seeking care from January 1, 2012 to December 31, 2018 at 456 community health centers (CHCs) across 15 states along with census-tract geocoded neighborhood composition and surname data. Study Design: We constructed and evaluated the performance of prediction models within a broad machine learning framework (Super Learner) for the estimation of Latino nativity. Outcomes included binary indicators denoting nativity (US vs. foreign-born) and Latino country of birth (Mexican, Cuban, Guatemalan). The performance of these models was compared using the area under the receiver operating characteristics curve (AUC) from an externally withheld patient sample. Data Collection/Extraction Methods: Census surname lists, census neighborhood composition, and Forebears administrative data were linked to EHR data. Principal Findings: Of the 19,985 Latino patients, 10.7% reported a non-US country of birth (5.1% Mexican, 4.7% Guatemalan, 0.8% Cuban). Overall, prediction models for nativity showed outstanding performance with external validation (US-born vs. foreign: AUC = 0.90; Mexican vs. non-Mexican: AUC = 0.89; Guatemalan vs. non-Guatemalan: AUC = 0.95; Cuban vs. non-Cuban: AUC = 0.99). Conclusions: Among challenges facing health equity researchers in health services is the absence of methods for data disaggregation, and the specific ability to determine Latino country of birth (nativity) to inform disparities. Recent interest in more robust health equity research has called attention to the importance of data disaggregation. In a multistate network of CHCs using multilevel inputs from EHR data linked to surname and community data, we developed and validated novel prediction models for the use of available EHR data to infer Latino nativity for health disparities research in primary care and health services research, which is a significant potential methodologic advance in studying this population.

KW - U.S. Census location

KW - ethnicity

KW - health disparities

KW - machine learning

KW - surname data

UR - http://www.scopus.com/inward/record.url?scp=85152058944&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85152058944&partnerID=8YFLogxK

U2 - 10.1111/1475-6773.14154

DO - 10.1111/1475-6773.14154

M3 - Article

C2 - 36978286

AN - SCOPUS:85152058944

SN - 0017-9124

VL - 58

SP - 1119

EP - 1130

JO - Health Services Research

JF - Health Services Research

IS - 5

ER -

Disaggregating Latino nativity in equity research using electronic health records

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this