Estimating treatment effects with machine learning

K. John McConnell; Stephan Lindner

doi:10.1111/1475-6773.13212

Estimating treatment effects with machine learning

K. John McConnell, Stephan Lindner

Research output: Contribution to journal › Article › peer-review

14 Scopus citations

Abstract

Objective: To demonstrate the performance of methodologies that include machine learning (ML) algorithms to estimate average treatment effects under the assumption of exogeneity (selection on observables). Data Sources: Simulated data and observational data on hospitalized adults. Study Design: We assessed the performance of several ML-based estimators, including Targeted Maximum Likelihood Estimation, Bayesian Additive Regression Trees, Causal Random Forests, Double Machine Learning, and Bayesian Causal Forests, applying these methods to simulated data as well as data on the effects of right heart catheterization. Principal Findings: In Monte Carlo studies, ML-based estimators generated estimates with smaller bias than traditional regression approaches, demonstrating substantial (69 percent-98 percent) bias reduction in some scenarios. Bayesian Causal Forests and Double Machine Learning were top performers, although all were sensitive to high dimensional (>150) sets of covariates. Conclusions: ML-based methods are promising methods for estimating treatment effects, allowing for the inclusion of many covariates and automating the search for nonlinearities and interactions among variables. We provide guidance and sample code for researchers interested in implementing these tools in their own empirical work.

Original language	English (US)
Pages (from-to)	1273-1282
Number of pages	10
Journal	Health Services Research
Volume	54
Issue number	6
DOIs	https://doi.org/10.1111/1475-6773.13212
State	Published - Dec 1 2019

Keywords

machine learning
observational research
treatment effects

ASJC Scopus subject areas

Health Policy

Access to Document

10.1111/1475-6773.13212

Cite this

@article{630153c347114b32939ada62a0350412,

title = "Estimating treatment effects with machine learning",

abstract = "Objective: To demonstrate the performance of methodologies that include machine learning (ML) algorithms to estimate average treatment effects under the assumption of exogeneity (selection on observables). Data Sources: Simulated data and observational data on hospitalized adults. Study Design: We assessed the performance of several ML-based estimators, including Targeted Maximum Likelihood Estimation, Bayesian Additive Regression Trees, Causal Random Forests, Double Machine Learning, and Bayesian Causal Forests, applying these methods to simulated data as well as data on the effects of right heart catheterization. Principal Findings: In Monte Carlo studies, ML-based estimators generated estimates with smaller bias than traditional regression approaches, demonstrating substantial (69 percent-98 percent) bias reduction in some scenarios. Bayesian Causal Forests and Double Machine Learning were top performers, although all were sensitive to high dimensional (>150) sets of covariates. Conclusions: ML-based methods are promising methods for estimating treatment effects, allowing for the inclusion of many covariates and automating the search for nonlinearities and interactions among variables. We provide guidance and sample code for researchers interested in implementing these tools in their own empirical work.",

keywords = "machine learning, observational research, treatment effects",

author = "McConnell, {K. John} and Stephan Lindner",

note = "Publisher Copyright: {\textcopyright} Health Research and Educational Trust",

year = "2019",

month = dec,

day = "1",

doi = "10.1111/1475-6773.13212",

language = "English (US)",

volume = "54",

pages = "1273--1282",

journal = "Health Services Research",

issn = "0017-9124",

publisher = "Wiley-Blackwell",

number = "6",

}

TY - JOUR

T1 - Estimating treatment effects with machine learning

AU - McConnell, K. John

AU - Lindner, Stephan

N1 - Publisher Copyright: © Health Research and Educational Trust

PY - 2019/12/1

Y1 - 2019/12/1

N2 - Objective: To demonstrate the performance of methodologies that include machine learning (ML) algorithms to estimate average treatment effects under the assumption of exogeneity (selection on observables). Data Sources: Simulated data and observational data on hospitalized adults. Study Design: We assessed the performance of several ML-based estimators, including Targeted Maximum Likelihood Estimation, Bayesian Additive Regression Trees, Causal Random Forests, Double Machine Learning, and Bayesian Causal Forests, applying these methods to simulated data as well as data on the effects of right heart catheterization. Principal Findings: In Monte Carlo studies, ML-based estimators generated estimates with smaller bias than traditional regression approaches, demonstrating substantial (69 percent-98 percent) bias reduction in some scenarios. Bayesian Causal Forests and Double Machine Learning were top performers, although all were sensitive to high dimensional (>150) sets of covariates. Conclusions: ML-based methods are promising methods for estimating treatment effects, allowing for the inclusion of many covariates and automating the search for nonlinearities and interactions among variables. We provide guidance and sample code for researchers interested in implementing these tools in their own empirical work.

AB - Objective: To demonstrate the performance of methodologies that include machine learning (ML) algorithms to estimate average treatment effects under the assumption of exogeneity (selection on observables). Data Sources: Simulated data and observational data on hospitalized adults. Study Design: We assessed the performance of several ML-based estimators, including Targeted Maximum Likelihood Estimation, Bayesian Additive Regression Trees, Causal Random Forests, Double Machine Learning, and Bayesian Causal Forests, applying these methods to simulated data as well as data on the effects of right heart catheterization. Principal Findings: In Monte Carlo studies, ML-based estimators generated estimates with smaller bias than traditional regression approaches, demonstrating substantial (69 percent-98 percent) bias reduction in some scenarios. Bayesian Causal Forests and Double Machine Learning were top performers, although all were sensitive to high dimensional (>150) sets of covariates. Conclusions: ML-based methods are promising methods for estimating treatment effects, allowing for the inclusion of many covariates and automating the search for nonlinearities and interactions among variables. We provide guidance and sample code for researchers interested in implementing these tools in their own empirical work.

KW - machine learning

KW - observational research

KW - treatment effects

UR - http://www.scopus.com/inward/record.url?scp=85074010071&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85074010071&partnerID=8YFLogxK

U2 - 10.1111/1475-6773.13212

DO - 10.1111/1475-6773.13212

M3 - Article

C2 - 31602641

AN - SCOPUS:85074010071

SN - 0017-9124

VL - 54

SP - 1273

EP - 1282

JO - Health Services Research

JF - Health Services Research

IS - 6

ER -

Estimating treatment effects with machine learning

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this