High-dimensional variable selection accounting for heterogeneity in regression coefficients across multiple data sources

Tingting Yu, Shangyuan Ye, Rui Wang

Research output: Contribution to journalArticlepeer-review

Abstract

When analyzing data combined from multiple sources (e.g., hospitals, studies), the heterogeneity across different sources must be accounted for. In this article, we consider high-dimensional linear regression models for integrative data analysis. We propose a new adaptive clustering penalty (ACP) method to simultaneously select variables and cluster source-specific regression coefficients with subhomogeneity. We show that the estimator based on the ACP method enjoys a strong oracle property under certain regularity conditions. We also develop an efficient algorithm based on the alternating direction method of multipliers (ADMM) for parameter estimation. We conduct simulation studies to compare the performance of the proposed method to three existing methods (a fused LASSO with adjacent fusion, a pairwise fused LASSO and a multidirectional shrinkage penalty method). Finally, we apply the proposed method to the multicentre Childhood Adenotonsillectomy Trial to identify subhomogeneity in the treatment effects across different study sites.

Original languageEnglish (US)
JournalCanadian Journal of Statistics
DOIs
StateAccepted/In press - 2023

Keywords

  • ADMM
  • coefficient clustering
  • data heterogeneity
  • k-means
  • variable selection

ASJC Scopus subject areas

  • Statistics and Probability
  • Statistics, Probability and Uncertainty

Fingerprint

Dive into the research topics of 'High-dimensional variable selection accounting for heterogeneity in regression coefficients across multiple data sources'. Together they form a unique fingerprint.

Cite this