This page explains the details of estimating weights from
Bayesian additive regression trees (BART)-based propensity scores by setting
method = "bart"
in the call to weightit()
or weightitMSM()
. This method
can be used with binary, multi-category, and continuous treatments.
In general, this method relies on estimating propensity scores using BART and then converting those propensity scores into weights using a formula that depends on the desired estimand. This method relies on dbartsbart2 from the dbarts package.
Binary Treatments
For binary treatments, this method estimates the propensity scores using
dbartsbart2. The following estimands are allowed: ATE, ATT, ATC,
ATO, ATM, and ATOS. Weights can also be computed using marginal mean
weighting through stratification for the ATE, ATT, and ATC. See
get_w_from_ps()
for details.
Multi-Category Treatments
For multi-category treatments, the propensity scores are estimated using
several calls to dbartsbart2, one for each treatment group; the
treatment probabilities are not normalized to sum to 1. The following
estimands are allowed: ATE, ATT, ATC, ATO, and ATM. The weights for each
estimand are computed using the standard formulas or those mentioned above.
Weights can also be computed using marginal mean weighting through
stratification for the ATE, ATT, and ATC. See get_w_from_ps()
for details.
Continuous Treatments For continuous treatments, weights are estimated as
\(w_i = f_A(a_i) / f_{A|X}(a_i)\), where \(f_A(a_i)\) (known as the
stabilization factor) is the unconditional density of treatment evaluated the
observed treatment value and \(f_{A|X}(a_i)\) (known as the generalized
propensity score) is the conditional density of treatment given the
covariates evaluated at the observed value of treatment. The shape of
\(f_A(.)\) and \(f_{A|X}(.)\) is controlled by the density
argument
described below (normal distributions by default), and the predicted values
used for the mean of the conditional density are estimated using BART as
implemented in dbartsbart2. Kernel density estimation can be used
instead of assuming a specific density for the numerator and denominator by
setting density = "kernel"
. Other arguments to density()
can be specified
to refine the density estimation parameters.
Longitudinal Treatments
For longitudinal treatments, the weights are the product of the weights estimated at each time point.
Missing Data
In the presence of missing data, the following value(s) for missing
are
allowed:
"ind"
(default)First, for each variable with missingness, a new missingness indicator variable is created which takes the value 1 if the original covariate is
NA
and 0 otherwise. The missingness indicators are added to the model formula as main effects. The missing values in the covariates are then replaced with the covariate medians. The weight estimation then proceeds with this new formula and set of covariates. The covariates output in the resultingweightit
object will be the original covariates with theNA
s.
Details
BART works by fitting a sum-of-trees model for the treatment or
probability of treatment. The number of trees is determined by the n.trees
argument. Bayesian priors are used for the hyperparameters, so the result is
a posterior distribution of predicted values for each unit. The mean of these
for each unit is taken for use in computing the (generalized) propensity
score. Although the hyperparameters governing the priors can be modified by
supplying arguments to weightit()
that are passed to the BART fitting
function, the default values tend to work well and require little
modification (though the defaults differ for continuous and categorical
treatments; see the dbartsbart2 documentation for details). Unlike
many other machine learning methods, no loss function is optimized and the
hyperparameters do not need to be tuned (e.g., using cross-validation),
though performance can benefit from tuning. BART tends to balance sparseness
with flexibility by using very weak learners as the trees, which makes it
suitable for capturing complex functions without specifying a particular
functional form and without overfitting.
Reproducibility
BART has a random component, so some work must be done to ensure
reproducibility across runs. See the Reproducibility section at
dbartsbart2 for more details. To ensure reproducibility, one can
do one of two things: 1) supply an argument to seed
, which is passed to
dbarts::bart2()
and sets the seed for single- and multi-threaded uses, or
2) call set.seed()
, though this only ensures reproducibility when using
single-threading, which can be requested by setting n.threads = 1
. Note
that to ensure reproducibility on any machine, regardless of the number of
cores available, one should use single-threading and either supply seed
or
call set.seed()
.
Additional Arguments
All arguments to dbartsbart2 can be passed through weightit()
or weightitMSM()
, with the following exceptions:
test
,weights
,subset
,offset.test
are ignoredcombine.chains
is always set toTRUE
sampleronly
is always set toFALSE
For continuous treatments only, the following arguments may be supplied:
density
A function corresponding to the conditional density of the treatment. The standardized residuals of the treatment model will be fed through this function to produce the numerator and denominator of the generalized propensity score weights. If blank,
dnorm()
is used as recommended by Robins et al. (2000). This can also be supplied as a string containing the name of the function to be called. If the string contains underscores, the call will be split by the underscores and the latter splits will be supplied as arguments to the second argument and beyond. For example, ifdensity = "dt_2"
is specified, the density used will be that of a t-distribution with 2 degrees of freedom. Using a t-distribution can be useful when extreme outcome values are observed (Naimi et al., 2014).Can also be
"kernel"
to use kernel density estimation, which callsdensity()
to estimate the numerator and denominator densities for the weights. (This used to be requested by settinguse.kernel = TRUE
, which is now deprecated.)bw
,adjust
,kernel
,n
If
density = "kernel"
, the arguments todensity()
. The defaults are the same as those indensity()
except thatn
is 10 times the number of units in the sample.plot
If
density = "kernel"
, whether to plot the estimated densities.
Additional Outputs
obj
When
include.obj = TRUE
, thebart2
fit(s) used to generate the predicted values. With multi-category treatments, this will be a list of the fits; otherwise, it will be a single fit. The predicted probabilities used to compute the propensity scores can be extracted using 2dbartsbartfitted.
References
Hill, J., Weiss, C., & Zhai, F. (2011). Challenges With Propensity Score Strategies in a High-Dimensional Setting and a Potential Alternative. Multivariate Behavioral Research, 46(3), 477–513. doi:10.1080/00273171.2011.570161
Chipman, H. A., George, E. I., & McCulloch, R. E. (2010). BART: Bayesian additive regression trees. The Annals of Applied Statistics, 4(1), 266–298. doi:10.1214/09-AOAS285
Note that many references that deal with BART for causal inference focus on estimating potential outcomes with BART, not the propensity scores, and so are not directly relevant when using BART to estimate propensity scores for weights.
See method_glm
for additional references on propensity score weighting
more generally.
See also
weightit()
, weightitMSM()
, get_w_from_ps()
method_super
for stacking predictions from several machine learning
methods, including BART.
Examples
library("cobalt")
data("lalonde", package = "cobalt")
#Balancing covariates between treatment groups (binary)
(W1 <- weightit(treat ~ age + educ + married +
nodegree + re74, data = lalonde,
method = "bart", estimand = "ATT"))
#> A weightit object
#> - method: "bart" (propensity score weighting with BART)
#> - number of obs.: 614
#> - sampling weights: none
#> - treatment: 2-category
#> - estimand: ATT (focal: 1)
#> - covariates: age, educ, married, nodegree, re74
summary(W1)
#> Summary of weights
#>
#> - Weight ranges:
#>
#> Min Max
#> treated 1.0000 || 1.0000
#> control 0.0027 |---------------------------| 9.3283
#>
#> - Units with the 5 most extreme weights by group:
#>
#> 5 4 3 2 1
#> treated 1 1 1 1 1
#> 585 569 592 374 608
#> control 2.2359 2.7456 3.0291 3.4298 9.3283
#>
#> - Weight statistics:
#>
#> Coef of Var MAD Entropy # Zeros
#> treated 0.000 0.000 0.000 0
#> control 1.787 0.918 0.714 0
#>
#> - Effective Sample Sizes:
#>
#> Control Treated
#> Unweighted 429. 185
#> Weighted 102.47 185
bal.tab(W1)
#> Balance Measures
#> Type Diff.Adj
#> prop.score Distance 0.5159
#> age Contin. 0.0610
#> educ Contin. -0.0342
#> married Binary -0.0345
#> nodegree Binary 0.0401
#> re74 Contin. -0.0602
#>
#> Effective sample sizes
#> Control Treated
#> Unadjusted 429. 185
#> Adjusted 102.47 185
#Balancing covariates with respect to race (multi-category)
(W2 <- weightit(race ~ age + educ + married +
nodegree + re74, data = lalonde,
method = "bart", estimand = "ATE"))
#> A weightit object
#> - method: "bart" (propensity score weighting with BART)
#> - number of obs.: 614
#> - sampling weights: none
#> - treatment: 3-category (black, hispan, white)
#> - estimand: ATE
#> - covariates: age, educ, married, nodegree, re74
summary(W2)
#> Summary of weights
#>
#> - Weight ranges:
#>
#> Min Max
#> black 1.2400 |----------------| 8.5591
#> hispan 2.6737 |------------------------| 13.1441
#> white 1.0572 |---------------| 8.3704
#>
#> - Units with the 5 most extreme weights by group:
#>
#> 283 181 244 423 231
#> black 6.6751 7.3595 7.8009 8.3363 8.5591
#> 512 392 346 570 564
#> hispan 12.4447 12.651 12.6907 12.9365 13.1441
#> 68 23 60 76 140
#> white 4.6 5.0271 5.5263 8.0026 8.3704
#>
#> - Weight statistics:
#>
#> Coef of Var MAD Entropy # Zeros
#> black 0.563 0.364 0.121 0
#> hispan 0.380 0.313 0.072 0
#> white 0.465 0.321 0.085 0
#>
#> - Effective Sample Sizes:
#>
#> black hispan white
#> Unweighted 243. 72. 299.
#> Weighted 184.71 63.01 246.01
bal.tab(W2)
#> Balance summary across all treatment pairs
#> Type Max.Diff.Adj
#> age Contin. 0.1822
#> educ Contin. 0.1613
#> married Binary 0.0549
#> nodegree Binary 0.0267
#> re74 Contin. 0.1146
#>
#> Effective sample sizes
#> black hispan white
#> Unadjusted 243. 72. 299.
#> Adjusted 184.71 63.01 246.01
#Balancing covariates with respect to re75 (continuous)
#assuming t(3) conditional density for treatment
(W3 <- weightit(re75 ~ age + educ + married +
nodegree + re74, data = lalonde,
method = "bart", density = "dt_3"))
#> A weightit object
#> - method: "bart" (propensity score weighting with BART)
#> - number of obs.: 614
#> - sampling weights: none
#> - treatment: continuous
#> - covariates: age, educ, married, nodegree, re74
summary(W3)
#> Summary of weights
#>
#> - Weight ranges:
#>
#> Min Max
#> all 0.079 |---------------------------| 19.4148
#>
#> - Units with the 5 most extreme weights:
#>
#> 308 310 469 484 487
#> all 7.3376 7.4307 9.7686 19.1384 19.4148
#>
#> - Weight statistics:
#>
#> Coef of Var MAD Entropy # Zeros
#> all 1.125 0.474 0.273 0
#>
#> - Effective Sample Sizes:
#>
#> Total
#> Unweighted 614.
#> Weighted 271.19
bal.tab(W3)
#> Balance Measures
#> Type Corr.Adj
#> age Contin. 0.0278
#> educ Contin. 0.0521
#> married Binary 0.0791
#> nodegree Binary -0.0790
#> re74 Contin. 0.1173
#>
#> Effective sample sizes
#> Total
#> Unadjusted 614.
#> Adjusted 271.19