Estimate Balancing Weights

weightit() allows for the easy generation of balancing weights using a variety of available methods for binary, continuous, and multi-category treatments. Many of these methods exist in other packages, which weightit() calls; these packages must be installed to use the desired method.

Usage

weightit(
  formula,
  data = NULL,
  method = "glm",
  estimand = "ATE",
  stabilize = FALSE,
  focal = NULL,
  by = NULL,
  s.weights = NULL,
  ps = NULL,
  moments = NULL,
  int = FALSE,
  subclass = NULL,
  missing = NULL,
  verbose = FALSE,
  include.obj = FALSE,
  keep.mparts = TRUE,
  ...
)

Arguments

formula: a formula with a treatment variable on the left hand side and the covariates to be balanced on the right hand side. See glm() for more details. Interactions and functions of covariates are allowed.
data: an optional data set in the form of a data frame that contains the variables in formula.
method: a string of length 1 containing the name of the method that will be used to estimate weights. See Details below for allowable options. The default is "glm" for propensity score weighting using a generalized linear model to estimate the propensity score.
estimand: the desired estimand. For binary and multi-category treatments, can be "ATE", "ATT", "ATC", and, for some methods, "ATO", "ATM", or "ATOS". The default for both is "ATE". This argument is ignored for continuous treatments. See the individual pages for each method for more information on which estimands are allowed with each method and what literature to read to interpret these estimands.
stabilize: whether or not and how to stabilize the weights. If TRUE, each unit's weight will be multiplied by a standardization factor, which is the the unconditional probability (or density) of each unit's observed treatment value. If a formula, a generalized linear model will be fit with the included predictors, and the inverse of the corresponding weight will be used as the standardization factor. Can only be used with continuous treatments or when estimand = "ATE". Default is FALSE for no standardization. See also the num.formula argument at weightitMSM(). For continuous treatments, weights are already stabilized, so setting stabilize = TRUE will be ignored with a warning (supplying a formula still works).
focal: when estimand is set to "ATT" or "ATC", which group to consider the "treated" or "control" group. This group will not be weighted, and the other groups will be weighted to resemble the focal group. If specified, estimand will automatically be set to "ATT" (with a warning if estimand is not "ATT" or "ATC"). See section estimand and focal in Details below.
by: a string containing the name of the variable in data for which weighting is to be done within categories or a one-sided formula with the stratifying variable on the right-hand side. For example, if by = "gender" or by = ~gender, a separate propensity score model or optimization will occur within each level of the variable "gender". Only one by variable is allowed; to stratify by multiply variables simultaneously, create a new variable that is a full cross of those variables using interaction().
s.weights: A vector of sampling weights or the name of a variable in data that contains sampling weights. These can also be matching weights if weighting is to be used on matched data. See the individual pages for each method for information on whether sampling weights can be supplied.
ps: A vector of propensity scores or the name of a variable in data containing propensity scores. If not NULL, method is ignored unless it is a user-supplied function, and the propensity scores will be used to create weights. formula must include the treatment variable in data, but the listed covariates will play no role in the weight estimation. Using ps is similar to calling get_w_from_ps() directly, but produces a full weightit object rather than just producing weights.
moments: numeric; for some methods, the greatest power of each covariate to be balanced. For example, if moments = 3, for each non-categorical covariate, the covariate, its square, and its cube will be balanced. This argument is ignored for other methods; to balance powers of the covariates, appropriate functions must be entered in formula. See the individual pages for each method for information on whether they accept moments.
int: logical; for some methods, whether first-order interactions of the covariates are to be balanced. This argument is ignored for other methods; to balance interactions between the variables, appropriate functions must be entered in formula. See the individual pages for each method for information on whether they accept int.
subclass: numeric; the number of subclasses to use for computing weights using marginal mean weighting with subclasses (MMWS). If NULL, standard inverse probability weights (and their extensions) will be computed; if a number greater than 1, subclasses will be formed and weights will be computed based on subclass membership. Attempting to set a non-NULL value for methods that don't compute a propensity score will result in an error; see each method's help page for information on whether MMWS weights are compatible with the method. See get_w_from_ps() for details and references.
missing: character; how missing data should be handled. The options and defaults depend on the method used. Ignored if no missing data is present. It should be noted that multiple imputation outperforms all available missingness methods available in weightit() and should probably be used instead. Consider the MatchThem package for the use of weightit() with multiply imputed data.
verbose: logical; whether to print additional information output by the fitting function.
include.obj: logical; whether to include in the output any fit objects created in the process of estimating the weights. For example, with method = "glm", the glm objects containing the propensity score model will be included. See the individual pages for each method for information on what object will be included if TRUE.
keep.mparts: logical; whether to include in the output components necessary to estimate standard errors that account for estimation of the weights in glm_weightit(). Default is TRUE if such parts are present. See the individual pages for each method for whether these components are produced. Set to FALSE to keep the output object smaller, e.g., if standard errors will not be computed using glm_weightit().
...: other arguments for functions called by weightit() that control aspects of fitting that are not covered by the above arguments. See Details.

Value

A weightit object with the following elements:

weights: The estimated weights, one for each unit.
treat: The values of the treatment variable.
covs: The covariates used in the fitting. Only includes the raw covariates, which may have been altered in the fitting process.
estimand: The estimand requested.
method: The weight estimation method specified.
ps: The estimated or provided propensity scores. Estimated propensity scores are returned for binary treatments and only when method is "glm", "gbm", "cbps", "ipt", "super", or "bart". The propensity score corresponds to the predicted probability of being treated; see section estimand and focal in Details for how the treated group is determined.
s.weights: The provided sampling weights.
focal: The focal treatment level if the ATT or ATC was requested.
by: A data.frame containing the by variable when specified.
obj: When include.obj = TRUE, the fit object.
info: Additional information about the fitting. See the individual methods pages for what is included.

When keep.mparts is TRUE (the default) and the chosen method is compatible with M-estimation, the components related to M-estimation for use in glm_weightit() are stored in the "Mparts" attribute. When by is specified, keep.mparts is set to FALSE.

Details

The primary purpose of weightit() is as a dispatcher to functions that perform the estimation of balancing weights using the requested method. Below are the methods allowed and links to pages containing more information about them, including additional arguments and outputs (e.g., when include.obj = TRUE), how missing values are treated, which estimands are allowed, and whether sampling weights are allowed.

`"glm"`	Propensity score weighting using generalized linear models
`"gbm"`	Propensity score weighting using generalized boosted modeling
`"cbps"`	Covariate Balancing Propensity Score weighting
`"npcbps"`	Non-parametric Covariate Balancing Propensity Score weighting
`"ebal"`	Entropy balancing
`"ipt"`	Inverse probability tilting
`"optweight"`	Optimization-based weighting
`"super"`	Propensity score weighting using SuperLearner
`"bart"`	Propensity score weighting using Bayesian additive regression trees (BART)
`"energy"`	Energy balancing

method can also be supplied as a user-defined function; see method_user for instructions and examples. Setting method = NULL computes unit weights.

`estimand` and `focal` For binary and multi-category treatments, the

argument to estimand determines what distribution the weighted sample should resemble. When set to "ATE", this requests that each group resemble the full sample. When set to "ATO", "ATM", or "ATOS" (for the methods that allow them), this requests that each group resemble an "overlap" sample. When set to "ATT" or "ATC", this requests that each group resemble the treated or control group, respectively (termed the "focal" group). Weights are set to 1 for the focal group.

How does weightit() decide which group is the treated and which group is the control? For binary treatments, several heuristics are used. The first is by checking whether a valid argument to focal was supplied containing the name of the focal group, which is the treated group when estimand = "ATT" and the control group when estimand = "ATC". If focal is not supplied, guesses are made using the following criteria, evaluated in order:

If the treatment variable is logical, TRUE is considered treated and FALSE control.
If the treatment is numeric (or a string or factor with values that can be coerced to numeric values), if 0 is one of the values, it is considered the control, and otherwise, the lower value is considered the control (with the other considered treated).
If exactly one of the treatment values is "t", "tr", "treat", "treated", or "exposed", it is considered the treated (and the other control).
If exactly one of the treatment values is "c", "co", "ctrl", "control", or "unexposed", it is considered the control (and the other treated).
If the treatment variable is a factor, the first level is considered control and the second treated.
The lowest value after sorting with sort() is considered control and the other treated. To be safe, it is best to code your binary treatment variable as 0 for control and 1 for treated. Otherwise, focal should be supplied when requesting the ATT or ATC. For multi-category treatments, focal is required when requesting the ATT or ATC; none of the heuristics above are used.

Citing WeightIt When using `weightit()`, please cite both the

WeightIt package (using citation("WeightIt")) and the paper(s) in the references section of the method used.

Examples

library("cobalt")
data("lalonde", package = "cobalt")

#Balancing covariates between treatment groups (binary)
(W1 <- weightit(treat ~ age + educ + married +
                  nodegree + re74, data = lalonde,
                method = "glm", estimand = "ATT"))
#> A weightit object
#>  - method: "glm" (propensity score weighting with GLM)
#>  - number of obs.: 614
#>  - sampling weights: none
#>  - treatment: 2-category
#>  - estimand: ATT (focal: 1)
#>  - covariates: age, educ, married, nodegree, re74
summary(W1)
#>                   Summary of weights
#> 
#> - Weight ranges:
#> 
#>            Min                                  Max
#> treated 1.0000               ||              1.0000
#> control 0.0222 |---------------------------| 2.0438
#> 
#> - Units with the 5 most extreme weights by group:
#>                                            
#>               5      4      3      2      1
#>  treated      1      1      1      1      1
#>             411    595    269    409    296
#>  control 1.3303 1.4365 1.5005 1.6369 2.0438
#> 
#> - Weight statistics:
#> 
#>         Coef of Var   MAD Entropy # Zeros
#> treated       0.000 0.000    0.00       0
#> control       0.823 0.701    0.33       0
#> 
#> - Effective Sample Sizes:
#> 
#>            Control Treated
#> Unweighted  429.       185
#> Weighted    255.99     185
bal.tab(W1)
#> Balance Measures
#>                Type Diff.Adj
#> prop.score Distance   0.0199
#> age         Contin.   0.0459
#> educ        Contin.  -0.0360
#> married      Binary   0.0044
#> nodegree     Binary   0.0080
#> re74        Contin.  -0.0275
#> 
#> Effective sample sizes
#>            Control Treated
#> Unadjusted  429.       185
#> Adjusted    255.99     185

#Balancing covariates with respect to race (multi-category)
(W2 <- weightit(race ~ age + educ + married +
                  nodegree + re74, data = lalonde,
                method = "ebal", estimand = "ATE"))
#> A weightit object
#>  - method: "ebal" (entropy balancing)
#>  - number of obs.: 614
#>  - sampling weights: none
#>  - treatment: 3-category (black, hispan, white)
#>  - estimand: ATE
#>  - covariates: age, educ, married, nodegree, re74
summary(W2)
#>                   Summary of weights
#> 
#> - Weight ranges:
#> 
#>           Min                                  Max
#> black  0.5530   |-------------------------| 5.3496
#> hispan 0.1408 |----------------|            3.3323
#> white  0.3978  |-------|                    1.9232
#> 
#> - Units with the 5 most extreme weights by group:
#>                                           
#>            226    244    485    181    182
#>   black 2.5215 2.5492 2.8059 3.5551 5.3496
#>            392    564    269    345    371
#>  hispan 2.0464 2.5298 2.6323 2.7049 3.3323
#>             68    457    599    589    531
#>   white 1.7109 1.7226 1.7433 1.7741 1.9232
#> 
#> - Weight statistics:
#> 
#>        Coef of Var   MAD Entropy # Zeros
#> black        0.590 0.413   0.131       0
#> hispan       0.609 0.440   0.163       0
#> white        0.371 0.306   0.068       0
#> 
#> - Effective Sample Sizes:
#> 
#>             black hispan  white
#> Unweighted 243.    72.   299.  
#> Weighted   180.47  52.71 262.93
bal.tab(W2)
#> Balance summary across all treatment pairs
#>             Type Max.Diff.Adj
#> age      Contin.            0
#> educ     Contin.            0
#> married   Binary            0
#> nodegree  Binary            0
#> re74     Contin.            0
#> 
#> Effective sample sizes
#>             black hispan  white
#> Unadjusted 243.    72.   299.  
#> Adjusted   180.47  52.71 262.93

#Balancing covariates with respect to re75 (continuous)
(W3 <- weightit(re75 ~ age + educ + married +
                  nodegree + re74, data = lalonde,
                method = "cbps"))
#> A weightit object
#>  - method: "cbps" (covariate balancing propensity score weighting)
#>  - number of obs.: 614
#>  - sampling weights: none
#>  - treatment: continuous
#>  - covariates: age, educ, married, nodegree, re74
summary(W3)
#>                   Summary of weights
#> 
#> - Weight ranges:
#> 
#>        Min                                   Max
#> all 0.0151 |---------------------------| 43.9963
#> 
#> - Units with the 5 most extreme weights:
#>                                             
#>          482     180     481     483     185
#>  all 10.8239 11.0878 11.9703 13.1314 43.9963
#> 
#> - Weight statistics:
#> 
#>     Coef of Var   MAD Entropy # Zeros
#> all       1.942 0.528   0.454       0
#> 
#> - Effective Sample Sizes:
#> 
#>             Total
#> Unweighted 614.  
#> Weighted   128.86
bal.tab(W3)
#> Balance Measures
#>             Type Corr.Adj
#> age      Contin.   0.1773
#> educ     Contin.   0.1071
#> married   Binary   0.4522
#> nodegree  Binary   0.3308
#> re74     Contin.   0.5514
#> 
#> Effective sample sizes
#>             Total
#> Unadjusted 614.  
#> Adjusted   128.86