TY - JOUR
T1 - Estimating treatment effects with machine learning
AU - McConnell, K. John
AU - Lindner, Stephan
N1 - Funding Information:
Joint Acknowledgment/Disclosure Statement : This work was supported in part by a grant from The Silver Family Foundation (Portland, Oregon). We are grateful to Thomas Meath, Matt Georg, and Kyle Tracy for help with coding, statistical analyses, and computing power considerations.
Publisher Copyright:
© Health Research and Educational Trust
PY - 2019/12/1
Y1 - 2019/12/1
N2 - Objective: To demonstrate the performance of methodologies that include machine learning (ML) algorithms to estimate average treatment effects under the assumption of exogeneity (selection on observables). Data Sources: Simulated data and observational data on hospitalized adults. Study Design: We assessed the performance of several ML-based estimators, including Targeted Maximum Likelihood Estimation, Bayesian Additive Regression Trees, Causal Random Forests, Double Machine Learning, and Bayesian Causal Forests, applying these methods to simulated data as well as data on the effects of right heart catheterization. Principal Findings: In Monte Carlo studies, ML-based estimators generated estimates with smaller bias than traditional regression approaches, demonstrating substantial (69 percent-98 percent) bias reduction in some scenarios. Bayesian Causal Forests and Double Machine Learning were top performers, although all were sensitive to high dimensional (>150) sets of covariates. Conclusions: ML-based methods are promising methods for estimating treatment effects, allowing for the inclusion of many covariates and automating the search for nonlinearities and interactions among variables. We provide guidance and sample code for researchers interested in implementing these tools in their own empirical work.
AB - Objective: To demonstrate the performance of methodologies that include machine learning (ML) algorithms to estimate average treatment effects under the assumption of exogeneity (selection on observables). Data Sources: Simulated data and observational data on hospitalized adults. Study Design: We assessed the performance of several ML-based estimators, including Targeted Maximum Likelihood Estimation, Bayesian Additive Regression Trees, Causal Random Forests, Double Machine Learning, and Bayesian Causal Forests, applying these methods to simulated data as well as data on the effects of right heart catheterization. Principal Findings: In Monte Carlo studies, ML-based estimators generated estimates with smaller bias than traditional regression approaches, demonstrating substantial (69 percent-98 percent) bias reduction in some scenarios. Bayesian Causal Forests and Double Machine Learning were top performers, although all were sensitive to high dimensional (>150) sets of covariates. Conclusions: ML-based methods are promising methods for estimating treatment effects, allowing for the inclusion of many covariates and automating the search for nonlinearities and interactions among variables. We provide guidance and sample code for researchers interested in implementing these tools in their own empirical work.
KW - machine learning
KW - observational research
KW - treatment effects
UR - http://www.scopus.com/inward/record.url?scp=85074010071&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85074010071&partnerID=8YFLogxK
U2 - 10.1111/1475-6773.13212
DO - 10.1111/1475-6773.13212
M3 - Article
C2 - 31602641
AN - SCOPUS:85074010071
SN - 0017-9124
VL - 54
SP - 1273
EP - 1282
JO - Health Services Research
JF - Health Services Research
IS - 6
ER -