TY - JOUR
T1 - Clinical, social, and policy factors in COVID-19 cases and deaths
T2 - methodological considerations for feature selection and modeling in county-level analyses
AU - Madlock-Brown, Charisse
AU - Wilkens, Ken
AU - Weiskopf, Nicole
AU - Cesare, Nina
AU - Bhattacharyya, Sharmodeep
AU - Riches, Naomi O.
AU - Espinoza, Juan
AU - Dorr, David
AU - Goetz, Kerry
AU - Phuong, Jimmy
AU - Sule, Anupam
AU - Kharrazi, Hadi
AU - Liu, Feifan
AU - Lemon, Cindy
AU - Adams, William G.
N1 - Publisher Copyright:
© 2022, The Author(s).
PY - 2022/12
Y1 - 2022/12
N2 - Background: There is a need to evaluate how the choice of time interval contributes to the lack of consistency of SDoH variables that appear as important to COVID-19 disease burden within an analysis for both case counts and death counts. Methods: This study identified SDoH variables associated with U.S county-level COVID-19 cumulative case and death incidence for six different periods: the first 30, 60, 90, 120, 150, and 180 days since each county had COVID-19 one case per 10,000 residents. The set of SDoH variables were in the following domains: resource deprivation, access to care/health resources, population characteristics, traveling behavior, vulnerable populations, and health status. A generalized variance inflation factor (GVIF) analysis was used to identify variables with high multicollinearity. For each dependent variable, a separate model was built for each of the time periods. We used a mixed-effect generalized linear modeling of counts normalized per 100,000 population using negative binomial regression. We performed a Kolmogorov-Smirnov goodness of fit test, an outlier test, and a dispersion test for each model. Sensitivity analysis included altering the county start date to the day each county reached 10 COVID-19 cases per 10,000. Results: Ninety-seven percent (3059/3140) of the counties were represented in the final analysis. Six features proved important for both the main and sensitivity analysis: adults-with-college-degree, days-sheltering-in-place-at-start, prior-seven-day-median-time-home, percent-black, percent-foreign-born, over-65-years-of-age, black-white-segregation, and days-since-pandemic-start. These variables belonged to the following categories: COVID-19 related, vulnerable populations, and population characteristics. Our diagnostic results show that across our outcomes, the models of the shorter time periods (30 days, 60 days, and 900 days) have a better fit. Conclusion: Our findings demonstrate that the set of SDoH features that are significant for COVID-19 outcomes varies based on the time from the start date of the pandemic and when COVID-19 was present in a county. These results could assist researchers with variable selection and inform decision makers when creating public health policy.
AB - Background: There is a need to evaluate how the choice of time interval contributes to the lack of consistency of SDoH variables that appear as important to COVID-19 disease burden within an analysis for both case counts and death counts. Methods: This study identified SDoH variables associated with U.S county-level COVID-19 cumulative case and death incidence for six different periods: the first 30, 60, 90, 120, 150, and 180 days since each county had COVID-19 one case per 10,000 residents. The set of SDoH variables were in the following domains: resource deprivation, access to care/health resources, population characteristics, traveling behavior, vulnerable populations, and health status. A generalized variance inflation factor (GVIF) analysis was used to identify variables with high multicollinearity. For each dependent variable, a separate model was built for each of the time periods. We used a mixed-effect generalized linear modeling of counts normalized per 100,000 population using negative binomial regression. We performed a Kolmogorov-Smirnov goodness of fit test, an outlier test, and a dispersion test for each model. Sensitivity analysis included altering the county start date to the day each county reached 10 COVID-19 cases per 10,000. Results: Ninety-seven percent (3059/3140) of the counties were represented in the final analysis. Six features proved important for both the main and sensitivity analysis: adults-with-college-degree, days-sheltering-in-place-at-start, prior-seven-day-median-time-home, percent-black, percent-foreign-born, over-65-years-of-age, black-white-segregation, and days-since-pandemic-start. These variables belonged to the following categories: COVID-19 related, vulnerable populations, and population characteristics. Our diagnostic results show that across our outcomes, the models of the shorter time periods (30 days, 60 days, and 900 days) have a better fit. Conclusion: Our findings demonstrate that the set of SDoH features that are significant for COVID-19 outcomes varies based on the time from the start date of the pandemic and when COVID-19 was present in a county. These results could assist researchers with variable selection and inform decision makers when creating public health policy.
UR - http://www.scopus.com/inward/record.url?scp=85128258590&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85128258590&partnerID=8YFLogxK
U2 - 10.1186/s12889-022-13168-y
DO - 10.1186/s12889-022-13168-y
M3 - Article
C2 - 35421958
AN - SCOPUS:85128258590
SN - 1471-2458
VL - 22
JO - BMC public health
JF - BMC public health
IS - 1
M1 - 747
ER -