Covariate selection for multilevel models with missing data

Miguel Marino, Orfeu M. Buxton, Yi Li

Research output: Contribution to journalArticlepeer-review

4 Scopus citations

Abstract

Missing covariate data hamper variable selection in multilevel regression settings. Current variable selection techniques for multiply-imputed data commonly address missingness in the predictors through list-wise deletion and stepwise-selection methods that are problematic. Moreover, most variable selection methods are developed for independent linear regression models and do not accommodate multilevel mixed effects regression models with incomplete covariate data. We develop a novel methodology that is able to perform covariate selection across multiply-imputed data for multilevel random effects models when missing data are present. Specifically, we propose to stack the multiply-imputed data sets from a multiple imputation procedure and to apply a group variable selection procedure through group lasso regularization to assess the overall impact of each predictor on the outcome across the imputed data sets. Simulations confirm the advantageous performance of the proposed method compared with the competing methods. We applied the method to reanalyse the Healthy Directions–Small Business cancer prevention study, which evaluated a behavioural intervention programme targeting multiple risk-related behaviours in a working-class, multi-ethnic population.

Original languageEnglish (US)
Pages (from-to)31-46
Number of pages16
JournalStat
Volume6
Issue number1
DOIs
StatePublished - Jan 2017

Keywords

  • BIC
  • Rubin's rules
  • cancer prevention
  • group lasso
  • intervention studies
  • multilevel
  • multiple imputation
  • regularization

ASJC Scopus subject areas

  • Statistics and Probability
  • Statistics, Probability and Uncertainty

Fingerprint

Dive into the research topics of 'Covariate selection for multilevel models with missing data'. Together they form a unique fingerprint.

Cite this