This page explains the details of estimating weights from covariate balancing propensity scores by setting method = "cbps" in the call to weightit() or weightitMSM(). This method can be used with binary, multinomial, and continuous treatments.

In general, this method relies on estimating propensity scores using generalized method of moments and then converting those propensity scores into weights using a formula that depends on the desired estimand. This method relies on CBPS::CBPS() from the CBPS package.

### Binary Treatments

For binary treatments, this method estimates the propensity scores and weights using CBPS::CBPS(). The following estimands are allowed: ATE, ATT, and ATC. The weights are taken from the output of the CBPS fit object. When the estimand is the ATE, the return propensity score is the probability of being in the "second" treatment group, i.e., levels(factor(treat)); when the estimand is the ATC, the returned propensity score is the probability of being in the control (i.e., non-focal) group.

### Multinomial Treatments

For multinomial treatments with three or four categories and when the estimand is the ATE, this method estimates the propensity scores and weights using one call to CBPS::CBPS(). For multinomial treatments with three or four categories or when the estimand is the ATT, this method estimates the propensity scores and weights using multiple calls to CBPS::CBPS(). The following estimands are allowed: ATE and ATT. The weights are taken from the output of the CBPS fit objects.

### Continuous Treatments

For continuous treatments, the generalized propensity score and weights are estimated using CBPS::CBPS().

### Longitudinal Treatments

For longitudinal treatments, the weights are the product of the weights estimated at each time point. This is not how CBPS::CBMSM() in the CBPS package estimates weights for longitudinal treatments.

### Sampling Weights

Sampling weights are supported through s.weights in all scenarios. See Note about sampling weights.

### Missing Data

In the presence of missing data, the following value(s) for missing are allowed:

"ind" (default)

First, for each variable with missingness, a new missingness indicator variable is created which takes the value 1 if the original covariate is NA and 0 otherwise. The missingness indicators are added to the model formula as main effects. The missing values in the covariates are then replaced with 0s (this value is arbitrary and does not affect estimation). The weight estimation then proceeds with this new formula and set of covariates. The covariates output in the resulting weightit object will be the original covariates with the NAs.

All arguments to CBPS() can be passed through weightit() or weightitMSM(), with the following exceptions:

• method in CBPS() is replaced with the argument over in weightit(). Setting over = FALSE in weightit() is the equivalent of setting method = "exact" in CBPS().

• sample.weights is ignored because sampling weights are passed using s.weights.

• standardize is ignored.

All arguments take on the defaults of those in CBPS(). It may be useful in many cases to set over = FALSE, especially with continuous treatments.

obj

When include.obj = TRUE, the CB(G)PS model fit. For binary treatments, multinomial treatments with estimand = "ATE" and four or fewer treatment levels, and continuous treatments, the output of the call to CBPS::CBPS(). For multinomial treatments with estimand = "ATT" or with more than four treatment levels, a list of CBPS fit objects.

## Details

CBPS estimates the coefficients of a logistic regression model (for binary treatments), multinomial logistic regression model (form multinomial treatments), or linear regression model (for continuous treatments) that is used to compute (generalized) propensity scores, from which the weights are computed. It involves augmenting the standard regression score equations with the balance constraints in an over-identified generalized method of moments estimation. The idea is to nudge the estimation of the coefficients toward those that produce balance in the weighted sample. The just-identified version (with exact = FALSE) does away with the score equations for the coefficients so that only the balance constraints (and the score equation for the variance of the error with a continuous treatment) are used. The just-identified version will therefore produce superior balance on the means (i.e., corresponding to the balance constraints) for binary and multinomial treatments and linear terms for continuous treatments than will the over-identified version.

Note that WeightIt provides less functionality than does the CBPS package in terms of the versions of CBPS available; for extensions to CBPS, the CBPS package may be preferred.

weightit(), weightitMSM()

CBPS::CBPS() for the fitting function

## Note

When sampling weights are used with CBPS::CBPS(), the estimated weights already incorporate the sampling weights. When weightit() is used with method = "cbps", the estimated weights are separated from the sampling weights, as they are with all other methods.

## Examples

library("cobalt")
data("lalonde", package = "cobalt")

#Balancing covariates between treatment groups (binary)
(W1 <- weightit(treat ~ age + educ + married +
nodegree + re74, data = lalonde,
method = "cbps", estimand = "ATT"))
#> A weightit object
#>  - method: "cbps" (covariate balancing propensity score weighting)
#>  - number of obs.: 614
#>  - sampling weights: none
#>  - treatment: 2-category
#>  - estimand: ATT (focal: 1)
#>  - covariates: age, educ, married, nodegree, re74
summary(W1)
#>                  Summary of weights
#>
#> - Weight ranges:
#>
#>           Min                                  Max
#> treated 1.000              ||               1.0000
#> control 0.017 |---------------------------| 2.2742
#>
#> - Units with 5 most extreme weights by group:
#>
#>               5      4      3      2      1
#>  treated      1      1      1      1      1
#>             589    595    269    409    296
#>  control 1.4755 1.4873 1.5799 1.7484 2.2742
#>
#> - Weight statistics:
#>
#>         Coef of Var   MAD Entropy # Zeros
#> treated       0.000 0.000  -0.000       0
#> control       0.839 0.707   0.341       0
#>
#> - Effective Sample Sizes:
#>
#>            Control Treated
#> Unweighted  429.       185
#> Weighted    251.99     185
bal.tab(W1)
#> Call
#>  weightit(formula = treat ~ age + educ + married + nodegree +
#>     re74, data = lalonde, method = "cbps", estimand = "ATT")
#>
#> Balance Measures
#> prop.score Distance   0.0163
#> age         Contin.  -0.0032
#> educ        Contin.   0.0017
#> married      Binary  -0.0003
#> nodegree     Binary  -0.0003
#> re74        Contin.   0.0005
#>
#> Effective sample sizes
#>            Control Treated

if (FALSE) {
#Balancing covariates with respect to race (multinomial)
(W2 <- weightit(race ~ age + educ + married +
nodegree + re74, data = lalonde,
method = "cbps", estimand = "ATE"))
summary(W2)
bal.tab(W2)
}

#Balancing covariates with respect to re75 (continuous)
(W3 <- weightit(re75 ~ age + educ + married +
nodegree + re74, data = lalonde,
method = "cbps", over = FALSE))
#> A weightit object
#>  - method: "cbps" (covariate balancing propensity score weighting)
#>  - number of obs.: 614
#>  - sampling weights: none
#>  - treatment: continuous
#>  - covariates: age, educ, married, nodegree, re74
summary(W3)
#>                  Summary of weights
#>
#> - Weight ranges:
#>
#>        Min                                   Max
#> all 0.0153 |---------------------------| 13.1553
#>
#> - Units with 5 most extreme weights by group:
#>
#>         484     482     180     481     483
#>  all 9.2225 10.8379 11.1353 11.9904 13.1553
#>
#> - Weight statistics:
#>
#>     Coef of Var   MAD Entropy # Zeros
#> all       1.152 0.449   0.288       0
#>
#> - Effective Sample Sizes:
#>
#>             Total
#> Unweighted 614.
#> Weighted   264.09
bal.tab(W3)
#> Call
#>  weightit(formula = re75 ~ age + educ + married + nodegree + re74,
#>     data = lalonde, method = "cbps", over = FALSE)
#>
#> Balance Measures