Clinical, social, and policy factors in COVID-19 cases and deaths: methodological considerations for feature selection and modeling in county-level analyses

Charisse Madlock-Brown, Ken Wilkens, Nicole Weiskopf, Nina Cesare, Sharmodeep Bhattacharyya, Naomi O. Riches, Juan Espinoza, David Dorr, Kerry Goetz, Jimmy Phuong, Anupam Sule, Hadi Kharrazi, Feifan Liu, Cindy Lemon, William G. Adams

Research output: Contribution to journalArticlepeer-review

6 Scopus citations


Background: There is a need to evaluate how the choice of time interval contributes to the lack of consistency of SDoH variables that appear as important to COVID-19 disease burden within an analysis for both case counts and death counts. Methods: This study identified SDoH variables associated with U.S county-level COVID-19 cumulative case and death incidence for six different periods: the first 30, 60, 90, 120, 150, and 180 days since each county had COVID-19 one case per 10,000 residents. The set of SDoH variables were in the following domains: resource deprivation, access to care/health resources, population characteristics, traveling behavior, vulnerable populations, and health status. A generalized variance inflation factor (GVIF) analysis was used to identify variables with high multicollinearity. For each dependent variable, a separate model was built for each of the time periods. We used a mixed-effect generalized linear modeling of counts normalized per 100,000 population using negative binomial regression. We performed a Kolmogorov-Smirnov goodness of fit test, an outlier test, and a dispersion test for each model. Sensitivity analysis included altering the county start date to the day each county reached 10 COVID-19 cases per 10,000. Results: Ninety-seven percent (3059/3140) of the counties were represented in the final analysis. Six features proved important for both the main and sensitivity analysis: adults-with-college-degree, days-sheltering-in-place-at-start, prior-seven-day-median-time-home, percent-black, percent-foreign-born, over-65-years-of-age, black-white-segregation, and days-since-pandemic-start. These variables belonged to the following categories: COVID-19 related, vulnerable populations, and population characteristics. Our diagnostic results show that across our outcomes, the models of the shorter time periods (30 days, 60 days, and 900 days) have a better fit. Conclusion: Our findings demonstrate that the set of SDoH features that are significant for COVID-19 outcomes varies based on the time from the start date of the pandemic and when COVID-19 was present in a county. These results could assist researchers with variable selection and inform decision makers when creating public health policy.

Original languageEnglish (US)
Article number747
JournalBMC public health
Issue number1
StatePublished - Dec 2022

ASJC Scopus subject areas

  • Public Health, Environmental and Occupational Health


Dive into the research topics of 'Clinical, social, and policy factors in COVID-19 cases and deaths: methodological considerations for feature selection and modeling in county-level analyses'. Together they form a unique fingerprint.

Cite this