Skip to contents

This page explains the details of estimating weights from covariate balancing propensity scores by setting method = "cbps" in the call to weightit() or weightitMSM(). This method can be used with binary, multi-category, and continuous treatments.

In general, this method relies on estimating propensity scores using generalized method of moments and then converting those propensity scores into weights using a formula that depends on the desired estimand. This method relies on code written for WeightIt using optim().

Binary Treatments

For binary treatments, this method estimates the propensity scores and weights using optim() using formulas described by Imai and Ratkovic (2014). The following estimands are allowed: ATE, ATT, and ATC.

Multi-Category Treatments

For multi-category treatments, this method estimates the generalized propensity scores and weights using optim() using formulas described by Imai and Ratkovic (2014). The following estimands are allowed: ATE and ATT.

Continuous Treatments

For continuous treatments, this method estimates the generalized propensity scores and weights using optim() using formulas described by Fong, Hazelett, and Imai (2018).

Longitudinal Treatments

For longitudinal treatments, the weights are the product of the weights estimated at each time point. This is not how CBPS::CBMSM() in the CBPS package estimates weights for longitudinal treatments.

Sampling Weights

Sampling weights are supported through s.weights in all scenarios.

Missing Data

In the presence of missing data, the following value(s) for missing are allowed:

"ind" (default)

First, for each variable with missingness, a new missingness indicator variable is created which takes the value 1 if the original covariate is NA and 0 otherwise. The missingness indicators are added to the model formula as main effects. The missing values in the covariates are then replaced with the covariate medians (this value is arbitrary and does not affect estimation). The weight estimation then proceeds with this new formula and set of covariates. The covariates output in the resulting weightit object will be the original covariates with the NAs.

M-estimation

M-estimation is supported for the just-identified CBPS (the default, setting over = FALSE) for all scenarios. See glm_weightit() and vignette("estimating-effects") for details.

Details

CBPS estimates the coefficients of a generalized linear model (for binary treatments), multinomial logistic regression model (for multi-category treatments), or linear regression model (for continuous treatments) that is used to compute (generalized) propensity scores, from which the weights are computed. It involves replacing (or augmenting, in the case of the over-identified version) the standard regression score equations with the balance constraints in a generalized method of moments estimation. The idea is to nudge the estimation of the coefficients toward those that produce balance in the weighted sample. The just-identified version (with exact = FALSE) does away with the score equations for the coefficients so that only the balance constraints (and the score equation for the variance of the error with a continuous treatment) are used. The just-identified version will therefore produce superior balance on the means (i.e., corresponding to the balance constraints) for binary and multi-category treatments and linear terms for continuous treatments than will the over-identified version.

Just-identified CBPS is very similar to entropy balancing and inverse probability tilting. For the ATT, all three methods will yield identical estimates. For other estimands, the results will differ.

Note that WeightIt provides different functionality from the CBPS package in terms of the versions of CBPS available; for extensions to CBPS (e.g., optimal CBPS, CBPS for instrumental variables, and jointly estimated CBPS for longitudinal treatments), the CBPS package may be preferred.

Note

This method used to rely on functionality in the CBPS package, but no longer does. Slight differences may be found between the two packages in some cases due to numerical imprecision. WeightIt supports arbitrary numbers of groups for the multi-category CBPS and any estimand, whereas CBPS only supports up to four groups and only the ATE. For continuous treatments with the over-identified CBPS, WeightIt and CBPS use different methods of specifying the GMM variance matrix, which may lead to differing results. Note that the default method differs between the two implementations; by default WeightIt uses the just-identified CBPS, which is faster to fit, yields better balance, and is compatible with M-estimation for estimating the standard error of the treatment effect, whereas CBPS uses the over-identified CBPS by default. However, both the just-identified and over-identified versions are available in both packages.

Additional Arguments

The following additional arguments can be specified:

over

logical; whether to request the over-identified CBPS, which combines the generalized linear model regression score equations (for binary treatments), multinomial logistic regression score equations (for multi-category treatments), or linear regression score equations (for continuous treatments) to the balance moment conditions. Default is FALSE to use the just-identified CBPS.

twostep

logical; when over = TRUE, whether to use the two-step approximation to the generalized method of moments variance. Default is TRUE. Ignored when over = FALSE.

link

"string"; the link used in the generalized linear model for the propensity scores when treatment is binary. Default is "logit" for logistic regression, which is used in the original description of the method by Imai and Ratkovic (2014), but others are allowed: "logit", "probit", "cauchit", and "cloglog" all use the binomial likelihood, "log" uses the Poisson likelihood, and "identity" uses the Gaussian likelihood (i.e., the linear probability model). Note that negative weights are possible with these last two and they should be used with caution. Ignored for multi-category and continuous treatments.

reltol

the relative tolerance for convergence of the optimization. Passed to the control argument of optim(). Default is sqrt(.Machine$double.eps).

maxit

the maximum number of iterations for convergence of the optimization. Passed to the control argument of optim(). Default is 1000.

Additional Outputs

obj

When include.obj = TRUE, the output of the final call to optim() used to produce the model parameters. Note that because of variable transformations, the resulting parameter estimates may not be interpretable.

References

Binary treatments

Imai, K., & Ratkovic, M. (2014). Covariate balancing propensity score. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 76(1), 243–263.

Multi-Category Treatments

Imai, K., & Ratkovic, M. (2014). Covariate balancing propensity score. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 76(1), 243–263.

Continuous treatments

Fong, C., Hazlett, C., & Imai, K. (2018). Covariate balancing propensity score for a continuous treatment: Application to the efficacy of political advertisements. The Annals of Applied Statistics, 12(1), 156–177. doi:10.1214/17-AOAS1101

Some of the code was inspired by the source code of the CBPS package.

See also

weightit(), weightitMSM()

method_ebal and method_ipt for entropy balancing and inverse probability tilting, which work similarly.

Examples

data("lalonde", package = "cobalt")

#Balancing covariates between treatment groups (binary)
(W1a <- weightit(treat ~ age + educ + married +
                  nodegree + re74, data = lalonde,
                method = "cbps", estimand = "ATT"))
#> A weightit object
#>  - method: "cbps" (covariate balancing propensity score weighting)
#>  - number of obs.: 614
#>  - sampling weights: none
#>  - treatment: 2-category
#>  - estimand: ATT (focal: 1)
#>  - covariates: age, educ, married, nodegree, re74
summary(W1a)
#>                  Summary of weights
#> 
#> - Weight ranges:
#> 
#>            Min                                  Max
#> treated 1.0000              ||               1.0000
#> control 0.0172 |---------------------------| 2.2625
#> 
#> - Units with the 5 most extreme weights by group:
#>                                            
#>               5      4      3      2      1
#>  treated      1      1      1      1      1
#>             589    595    269    409    296
#>  control 1.4644 1.4848 1.5763 1.7434 2.2625
#> 
#> - Weight statistics:
#> 
#>         Coef of Var   MAD Entropy # Zeros
#> treated       0.000 0.000   0.000       0
#> control       0.839 0.707   0.341       0
#> 
#> - Effective Sample Sizes:
#> 
#>            Control Treated
#> Unweighted  429.       185
#> Weighted    252.12     185
cobalt::bal.tab(W1a)
#> Balance Measures
#>                Type Diff.Adj
#> prop.score Distance   0.0164
#> age         Contin.   0.0000
#> educ        Contin.   0.0000
#> married      Binary  -0.0000
#> nodegree     Binary  -0.0000
#> re74        Contin.  -0.0000
#> 
#> Effective sample sizes
#>            Control Treated
#> Unadjusted  429.       185
#> Adjusted    252.12     185

#Balancing covariates between treatment groups (binary)
#using over-identified CBPS with probit link
(W1b <- weightit(treat ~ age + educ + married +
                  nodegree + re74, data = lalonde,
                method = "cbps", estimand = "ATT",
                over = TRUE, link = "probit"))
#> A weightit object
#>  - method: "cbps" (covariate balancing propensity score weighting)
#>  - number of obs.: 614
#>  - sampling weights: none
#>  - treatment: 2-category
#>  - estimand: ATT (focal: 1)
#>  - covariates: age, educ, married, nodegree, re74
summary(W1b)
#>                  Summary of weights
#> 
#> - Weight ranges:
#> 
#>            Min                                  Max
#> treated 1.0000                 ||            1.0000
#> control 0.0181 |---------------------------| 1.8803
#> 
#> - Units with the 5 most extreme weights by group:
#>                                            
#>               5      4      3      2      1
#>  treated      1      1      1      1      1
#>             411    595    269    409    296
#>  control 1.2702 1.3139 1.4024 1.5131 1.8803
#> 
#> - Weight statistics:
#> 
#>         Coef of Var   MAD Entropy # Zeros
#> treated       0.000 0.000   0.000       0
#> control       0.793 0.685   0.315       0
#> 
#> - Effective Sample Sizes:
#> 
#>            Control Treated
#> Unweighted  429.       185
#> Weighted    263.58     185
cobalt::bal.tab(W1b)
#> Balance Measures
#>                Type Diff.Adj
#> prop.score Distance   0.0404
#> age         Contin.   0.0233
#> educ        Contin.  -0.0201
#> married      Binary   0.0048
#> nodegree     Binary   0.0119
#> re74        Contin.  -0.0609
#> 
#> Effective sample sizes
#>            Control Treated
#> Unadjusted  429.       185
#> Adjusted    263.58     185

#Balancing covariates with respect to race (multi-category)
(W2 <- weightit(race ~ age + educ + married +
                  nodegree + re74, data = lalonde,
                method = "cbps", estimand = "ATE"))
#> A weightit object
#>  - method: "cbps" (covariate balancing propensity score weighting)
#>  - number of obs.: 614
#>  - sampling weights: none
#>  - treatment: 3-category (black, hispan, white)
#>  - estimand: ATE
#>  - covariates: age, educ, married, nodegree, re74
summary(W2)
#>                  Summary of weights
#> 
#> - Weight ranges:
#> 
#>           Min                                   Max
#> black  1.5007  |------------------|         17.9659
#> hispan 1.6315  |--------------------------| 24.5609
#> white  1.1311 |--|                           4.1340
#> 
#> - Units with the 5 most extreme weights by group:
#>                                               
#>            226     231     485     181     182
#>   black 6.7985  6.8385  7.2674  9.8974 17.9659
#>            392     564     269     345     371
#>  hispan 16.762 19.8529 22.0193 23.8782 24.5609
#>            398     432     437     404     599
#>   white 3.6882  3.7815  3.8476  3.8952   4.134
#> 
#> - Weight statistics:
#> 
#>        Coef of Var   MAD Entropy # Zeros
#> black        0.635 0.387   0.133       0
#> hispan       0.582 0.447   0.155       0
#> white        0.389 0.327   0.071       0
#> 
#> - Effective Sample Sizes:
#> 
#>             black hispan  white
#> Unweighted 243.    72.   299.  
#> Weighted   173.37  53.95 259.76
cobalt::bal.tab(W2)
#> Balance summary across all treatment pairs
#>             Type Max.Diff.Adj
#> age      Contin.            0
#> educ     Contin.            0
#> married   Binary            0
#> nodegree  Binary            0
#> re74     Contin.            0
#> 
#> Effective sample sizes
#>             black hispan  white
#> Unadjusted 243.    72.   299.  
#> Adjusted   173.37  53.95 259.76

#Balancing covariates with respect to re75 (continuous)
(W3 <- weightit(re75 ~ age + educ + married +
                  nodegree + re74, data = lalonde,
                method = "cbps"))
#> A weightit object
#>  - method: "cbps" (covariate balancing propensity score weighting)
#>  - number of obs.: 614
#>  - sampling weights: none
#>  - treatment: continuous
#>  - covariates: age, educ, married, nodegree, re74
summary(W3)
#>                  Summary of weights
#> 
#> - Weight ranges:
#> 
#>        Min                                   Max
#> all 0.0099 |---------------------------| 21.0343
#> 
#> - Units with the 5 most extreme weights:
#>                                             
#>          485     481     482     484     483
#>  all 10.2635 13.1282 14.0059 17.9287 21.0343
#> 
#> - Weight statistics:
#> 
#>     Coef of Var   MAD Entropy # Zeros
#> all       1.458 0.536   0.397       0
#> 
#> - Effective Sample Sizes:
#> 
#>             Total
#> Unweighted 614.  
#> Weighted   196.58
cobalt::bal.tab(W3)
#> Balance Measures
#>             Type Corr.Adj
#> age      Contin.        0
#> educ     Contin.        0
#> married   Binary       -0
#> nodegree  Binary        0
#> re74     Contin.        0
#> 
#> Effective sample sizes
#>             Total
#> Unadjusted 614.  
#> Adjusted   196.58