This page explains the details of estimating weights from Bayesian additive regression trees (BART)-based propensity scores by setting method = "bart" in the call to weightit() or weightitMSM(). This method can be used with binary, multinomial, and continuous treatments.

In general, this method relies on estimating propensity scores using BART and then converting those propensity scores into weights using a formula that depends on the desired estimand. This method relies on dbarts::bart2() from the dbarts package.

### Binary Treatments

For binary treatments, this method estimates the propensity scores using dbarts::bart2(). The following estimands are allowed: ATE, ATT, ATC, ATO, ATM, and ATOS. Weights can also be computed using marginal mean weighting through stratification for the ATE, ATT, and ATC. See get_w_from_ps() for details.

### Multinomial Treatments

For multinomial treatments, the propensity scores are estimated using several calls to dbarts::bart2(), one for each treatment group; the treatment probabilities are not normalized to sum to 1. The following estimands are allowed: ATE, ATT, ATC, ATO, and ATM. The weights for each estimand are computed using the standard formulas or those mentioned above. Weights can also be computed using marginal mean weighting through stratification for the ATE, ATT, and ATC. See get_w_from_ps() for details.

### Continuous Treatments

For continuous treatments, the generalized propensity score is estimated using dbarts::bart2(). In addition, kernel density estimation can be used instead of assuming a normal density for the numerator and denominator of the generalized propensity score by setting use.kernel = TRUE. Other arguments to density() can be specified to refine the density estimation parameters. plot = TRUE can be specified to plot the density for the numerator and denominator, which can be helpful in diagnosing extreme weights.

### Longitudinal Treatments

For longitudinal treatments, the weights are the product of the weights estimated at each time point.

### Sampling Weights

Sampling weights are not supported.

### Missing Data

In the presence of missing data, the following value(s) for missing are allowed:

"ind" (default)

First, for each variable with missingness, a new missingness indicator variable is created which takes the value 1 if the original covariate is NA and 0 otherwise. The missingness indicators are added to the model formula as main effects. The missing values in the covariates are then replaced with 0s. The weight estimation then proceeds with this new formula and set of covariates. The covariates output in the resulting weightit object will be the original covariates with the NAs.

All arguments to dbarts::bart2() can be passed through weightit() or weightitMSM(), with the following exceptions:

• test, weights, subset, offset.test are ignored

• combine.chains is always set to TRUE

• sampleronly is always set to FALSE

For continuous treatments only, the following arguments may be supplied:

density

A function corresponding to the conditional density of the treatment. The standardized residuals of the treatment model will be fed through this function to produce the numerator and denominator of the generalized propensity score weights. If blank, dnorm() is used as recommended by Robins et al. (2000). This can also be supplied as a string containing the name of the function to be called. If the string contains underscores, the call will be split by the underscores and the latter splits will be supplied as arguments to the second argument and beyond. For example, if density = "dt_2" is specified, the density used will be that of a t-distribution with 2 degrees of freedom. Using a t-distribution can be useful when extreme outcome values are observed (Naimi et al., 2014). Ignored if use.kernel = TRUE (described below).

use.kernel

If TRUE, uses kernel density estimation through density() to estimate the numerator and denominator densities for the weights. If FALSE, the argument to the density parameter is used instead.

bw, adjust, kernel, n

If use.kernel = TRUE, the arguments to the density() function. The defaults are the same as those in density except that n is 10 times the number of units in the sample.

plot

If use.kernel = TRUE, whether to plot the estimated density.

obj

When include.obj = TRUE, the bart2 fit(s) used to generate the predicted values. With multinomial treatments, this will be a list of the fits; otherwise, it will be a single fit. The predicted probabilities used to compute the propensity scores can be extracted using fitted().

## Details

BART works by fitting a sum-of-trees model for the treatment or probability of treatment. The number of trees is determined by the n.trees argument. Bayesian priors are used for the hyperparameters, so the result is a posterior distribution of predicted values for each unit. The mean of these for each unit is taken for use in computing the (generalized) propensity score. Although the hyperparameters governing the priors can be modified by supplying arguments to weightit() that are passed to the BART fitting function, the default values tend to work well and require little modification (though the defaults differ for continuous and categorical treatments; see the dbarts::bart2() documentation for details). Unlike many other machine learning methods, no loss function is optimized and the hyperparameters do not need to be tuned (e.g., using cross-validation), though performance can benefit from tuning. BART tends to balance sparseness with flexibility by using very weak learners as the trees, which makes it suitable for capturing complex functions without specifying a particular functional form and without overfitting.

## Note

With version 0.9-19 or below of dbarts, special care has to be taken to ensure reproducibility when using method = "bart". Setting a seed (either with set.seed() or by supplying an argument to rngSeed) will only work when only one thread is requested. The default is to use four threads. To request that only one thread is used, which is necessary for reproducible results, set n.threads = 1 in the call to weightit() and set a seed. Note that the fewer threads are used, the slower the estimation will be. One can set n.chains to a lower number (default 4) to speed up the estimation at the possible expense of statistical performance.

With version 0.9-20 and above, setting the seed with set.seed() works correctly and results will be reproducible.

## References

Hill, J., Weiss, C., & Zhai, F. (2011). Challenges With Propensity Score Strategies in a High-Dimensional Setting and a Potential Alternative. Multivariate Behavioral Research, 46(3), 477–513. doi:10.1080/00273171.2011.570161

Chipman, H. A., George, E. I., & McCulloch, R. E. (2010). BART: Bayesian additive regression trees. The Annals of Applied Statistics, 4(1), 266–298. doi:10.1214/09-AOAS285

Note that many references that deal with BART for causal inference focus on estimating potential outcomes with BART, not the propensity scores, and so are not directly relevant when using BART to estimate propensity scores for weights.

See method_ps for additional references on propensity score weighting more generally.

weightit(), weightitMSM(), get_w_from_ps()

method_super for stacking predictions from several machine learning methods, including BART.

## Examples

library("cobalt")
data("lalonde", package = "cobalt")

#Balancing covariates between treatment groups (binary)
(W1 <- weightit(treat ~ age + educ + married +
nodegree + re74, data = lalonde,
method = "bart", estimand = "ATT"))
#> A weightit object
#>  - method: "bart" (propensity score weighting with BART)
#>  - number of obs.: 614
#>  - sampling weights: none
#>  - treatment: 2-category
#>  - estimand: ATT (focal: 1)
#>  - covariates: age, educ, married, nodegree, re74
summary(W1)
#>                  Summary of weights
#>
#> - Weight ranges:
#>
#>            Min                                  Max
#> treated 1.0000    ||                         1.0000
#> control 0.0029 |---------------------------| 9.2692
#>
#> - Units with 5 most extreme weights by group:
#>
#>               8      7      5      4      1
#>  treated      1      1      1      1      1
#>             409    569    592    374    608
#>  control 2.1655 2.7735 2.8488 3.4074 9.2692
#>
#> - Weight statistics:
#>
#>         Coef of Var  MAD Entropy # Zeros
#> treated       0.000 0.00  -0.000       0
#> control       1.775 0.93   0.721       0
#>
#> - Effective Sample Sizes:
#>
#>            Control Treated
#> Unweighted  429.       185
#> Weighted    103.55     185
bal.tab(W1)
#> Call
#>  weightit(formula = treat ~ age + educ + married + nodegree +
#>     re74, data = lalonde, method = "bart", estimand = "ATT")
#>
#> Balance Measures
#> prop.score Distance   0.4876
#> age         Contin.   0.0762
#> educ        Contin.  -0.0229
#> married      Binary  -0.0316
#> nodegree     Binary   0.0333
#> re74        Contin.  -0.0544
#>
#> Effective sample sizes
#>            Control Treated
# \donttest{
#Balancing covariates with respect to race (multinomial)
(W2 <- weightit(race ~ age + educ + married +
nodegree + re74, data = lalonde,
method = "bart", estimand = "ATE"))
#> A weightit object
#>  - method: "bart" (propensity score weighting with BART)
#>  - number of obs.: 614
#>  - sampling weights: none
#>  - treatment: 3-category (black, hispan, white)
#>  - estimand: ATE
#>  - covariates: age, educ, married, nodegree, re74
summary(W2)
#>                  Summary of weights
#>
#> - Weight ranges:
#>
#>           Min                                   Max
#> black  1.2403 |-----------------|            8.9210
#> hispan 2.7963     |-----------------------| 12.6612
#> white  1.0725 |---------------|              7.9110
#>
#> - Units with 5 most extreme weights by group:
#>
#>             226     181     244     423     231
#>   black  7.1688  7.4371  8.1474   8.393   8.921
#>             346     426     512     564     570
#>  hispan 12.2435 12.3098 12.5561 12.5629 12.6612
#>              68      23      60      76     140
#>   white  4.6893  4.9037  5.4283  7.5937   7.911
#>
#> - Weight statistics:
#>
#>        Coef of Var   MAD Entropy # Zeros
#> black        0.578 0.371   0.126       0
#> hispan       0.359 0.295   0.064       0
#> white        0.448 0.317   0.080       0
#>
#> - Effective Sample Sizes:
#>
#>             black hispan  white
#> Unweighted 243.    72.   299.
#> Weighted   182.39  63.86 249.14
bal.tab(W2)
#> Call
#>  weightit(formula = race ~ age + educ + married + nodegree + re74,
#>     data = lalonde, method = "bart", estimand = "ATE")
#>
#> Balance summary across all treatment pairs
#> age      Contin.       0.1851
#> educ     Contin.       0.1749
#> married   Binary       0.0519
#> nodegree  Binary       0.0279
#> re74     Contin.       0.1086
#>
#> Effective sample sizes
#>             black hispan  white

#Balancing covariates with respect to re75 (continuous)
#assuming t(3) conditional density for treatment
(W3 <- weightit(re75 ~ age + educ + married +
nodegree + re74, data = lalonde,
method = "bart", density = "dt_3"))
#> A weightit object
#>  - method: "bart" (propensity score weighting with BART)
#>  - number of obs.: 614
#>  - sampling weights: none
#>  - treatment: continuous
#>  - covariates: age, educ, married, nodegree, re74
summary(W3)
#>                  Summary of weights
#>
#> - Weight ranges:
#>
#>        Min                                   Max
#> all 0.0773 |---------------------------| 20.7365
#>
#> - Units with 5 most extreme weights by group:
#>
#>        308    486    469     487     484
#>  all 7.168 7.3026 8.8006 18.5597 20.7365
#>
#> - Weight statistics:
#>
#>     Coef of Var   MAD Entropy # Zeros
#> all       1.132 0.473   0.273       0
#>
#> - Effective Sample Sizes:
#>
#>             Total
#> Unweighted 614.
#> Weighted   269.49
bal.tab(W3)
#> Call
#>  weightit(formula = re75 ~ age + educ + married + nodegree + re74,
#>     data = lalonde, method = "bart", density = "dt_3")
#>
#> Balance Measures