Skip to contents

Estimate stable balancing weights for treatments and covariates specified in formula. The degree of balance for each covariate is specified by tols and the target population can be specified with targets or estimand. See Zubizarreta (2015) and Wang & Zubizarreta (2019) for details of the properties of the weights and the methods used to fit them.

Usage

optweight(
  formula,
  data = NULL,
  tols = 0,
  estimand = "ATE",
  targets = NULL,
  s.weights = NULL,
  b.weights = NULL,
  focal = NULL,
  norm = "l2",
  verbose = FALSE,
  ...
)

optweightMV(
  formula.list,
  data = NULL,
  tols.list = list(0),
  estimand = "ATE",
  targets = NULL,
  s.weights = NULL,
  b.weights = NULL,
  focal = NULL,
  norm = "l2",
  verbose = FALSE,
  ...
)

Arguments

formula

A formula with a treatment variable on the left hand side and the covariates to be balanced on the right hand side, or a list thereof. See glm() for more details. Interactions and functions of covariates are allowed.

data

An optional data set in the form of a data frame that contains the variables in formula.

tols

A vector of balance tolerance values for each covariate, or a list thereof. The resulting weighted balance statistics will be at least as small as these values. If only one value is supplied, it will be applied to all covariates. Can also be the output of a call to process_tols(). See Details.

estimand

The desired estimand, which determines the target population. For binary treatments, can be "ATE", "ATT", "ATC", or NULL. For multi-category treatments, can be "ATE", "ATT", or NULL. For continuous treatments, can be "ATE" or NULL. The default for both is "ATE". For optweightMV(), only "ATE" or NULL are supported. estimand is ignored when targets is non-NULL. If both estimand and targets are NULL, no targeting will take place. See Details.

targets

A vector of target population mean values for each baseline covariate. The resulting weights will yield sample means within tols/2 units of the target values for each covariate. If NULL or all NA, estimand will be used to determine targets. Otherwise, estimand is ignored. If any target values are NA, the corresponding variable will not be targeted and its weighted mean will be wherever the weights yield the smallest variance. Can also be the output of a call to process_targets(). See Details.

s.weights

A vector of sampling weights or the name of a variable in data that contains sampling weights.

b.weights

A vector of base weights or the name of a variable in data that contains base weights. If supplied, the desired norm of the distance between the estimated weights and the base weights is minimized.

focal

When multi-category treatments are used and estimand = "ATT", which group to consider the "treated" or focal group. This group will not be weighted, and the other groups will be weighted to be more like the focal group. If specified, estimand will automatically be set to "ATT".

norm

character; a string containing the name of the norm corresponding to the objective function to minimize. Allowable options include "l1" for the L1 norm, "l2" for the L2 norm (the default), "linf" for the L\(\infty\) norm, "entropy" for the negative entropy, and "log" for the sum of the logs. See optweight.fit() for details.

verbose

logical; whether information on the optimization problem solution should be printed. Default is FALSE.

...

Arguments passed on to optweight.fit, optweightMV.fit

std.binary,std.cont

logical; whether the tolerances are in standardized mean units (TRUE) or raw units (FALSE) for binary variables and continuous variables, respectively. The default is FALSE for std.binary because raw proportion differences make more sense than standardized mean difference for binary variables. These arguments are analogous to the binary and continuous arguments in bal.tab() in cobalt.

min.w

numeric; a single value less than 1 for the smallest allowable weight. Some analyses require nonzero weights for all units, so a small, nonzero minimum may be desirable. The default is 1e-8 (\(10^{-8}\)), which does not materially change the properties of the weights from a minimum of 0 but prevents warnings in some packages that use weights to estimate treatment effects. When norm is "entropy" or "log" and min.w <= 0, min.w will be set to the smallest nonzero value.

covs.list

a list containing one numeric matrix of covariates to be balanced for each treatment.

treat.list

a list containing one vector of treatment statuses for each treatment.

solver

string; the name of the optimization solver to use. Allowable options depend on norm. Default is to use whichever eligible solver is installed, if any, or the default solver for the corresponding norm. See Details for information.

formula.list

A list of formulas, each with a treatment variable on the left hand side and the covariates to be balanced on the right hand side.

tols.list

A list of vectors of balance tolerance values for each covariate for each treatment. The resulting weighted balance statistics will be at least as small as these values. If only one value is supplied, it will be applied to all covariates. See Details.

Value

For optweight(), an optweight object with the following elements:

weights

The estimated weights, one for each unit.

treat

The values of the treatment variable.

covs

The covariates used in the fitting. Only includes the raw covariates, which may have been altered in the fitting process.

s.weights

The provided sampling weights.

b.weights

The provided base weights.

estimand

The estimand requested.

focal

The focal variable if the ATT was requested with a multi-category treatment.

call

The function call.

tols

The tolerance values for each covariate.

duals

A data.frame containing the dual variables for each covariate. See Details for interpretation of these values.

info

Information about the performance of the optimization at termination.

For optweightMV(), an optweightMV object with the following elements:

weights

The estimated weights, one for each unit.

treat.list

A list of the values of the treatment variables.

covs.list

A list of the covariates for each treatment used in the fitting. Only includes the raw covariates, which may have been altered in the fitting process.

s.weights

The provided sampling weights.

b.weights

The provided base weights.

call

The function call.

tols

A list of tolerance values for each covariate for each treatment.

duals

A list of data.frames containing the dual variables for each covariate for each treatment. See Details for interpretation of these values.

info

Information about the performance of the optimization at termination.

Details

The optimization is performed by the lower-level function optweight.fit() (for optweight()) or optweightMV.fit() (for optweightMV()).

For binary and multi-category treatments, weights are estimated so that the weighted mean differences of the covariates are within the given tolerance thresholds (unless std.binary or std.cont are TRUE, in which case standardized mean differences are considered for binary and continuous variables, respectively). For a covariate \(x\) with specified tolerance \(\delta\), the weighted means of each each group will be within \(\delta\) of each other. Additionally, when the ATE is specified as the estimand or a target population is specified, the weighted means of each group will each be within \(\delta/2\) of the target means; this ensures generalizability to the same population from which the original sample was drawn.

If standardized tolerance values are requested, the standardization factor corresponds to the estimand requested: when the ATE is requested or a target population specified, the standardization factor is the square root of the average variance for that covariate across treatment groups, and when the ATT or ATC are requested, the standardization factor is the standard deviation of the covariate in the focal group. The standardization factor is computed accounting for s.weights.

For continuous treatments, weights are estimated so that the weighted correlation between the treatment and each covariate is within the specified tolerance threshold. If the ATE is requested or a target population is specified, the means of the weighted covariates and treatment are restricted to be equal to those of the target population to ensure generalizability to the desired target population. The weighted correlation is computed as the weighted covariance divided by the product of the unweighted standard deviations. The means used to center the variables in computing the covariance are those specified in the target population.

Dual Variables

Two types of constraints may be associated with each covariate: target constraints and balance constraints. Target constraints require the mean of the covariate to be at (or near) a specific target value in each treatment group (or for the whole group when treatment is continuous). Balance constraints require the means of the covariate in pairs of treatments to be near each other. For binary and multi-category treatments, balance constraints are redundant if target constraints are provided for a variable. For continuous variables, balance constraints refer to the correlation between treatment and the covariate and are not redundant with target constraints. In the duals component of the output, each covariate has a dual variable for each nonredundant constraint placed on it.

The dual variable for each constraint is the instantaneous rate of change of the objective function at the optimum corresponding to a change in the constraint. Because this relationship is not linear, large changes in the constraint will not exactly map onto corresponding changes in the objective function at the optimum, but will be close for small changes in the constraint. For example, for a covariate with a balance constraint of .01 and a corresponding dual variable of 40, increasing (i.e., relaxing) the constraint to .025 will decrease the value of the objective function at the optimum by approximately \((.025 - .01) * 40 = .6\).

For factor variables, optweight() takes the sum of the absolute dual variables for the constraints for all levels and reports it as the the single dual variable for the variable itself. This summed dual variable works the same way as dual variables for continuous variables do.

References

Chattopadhyay, A., Cohn, E. R., & Zubizarreta, J. R. (2024). One-Step Weighting to Generalize and Transport Treatment Effect Estimates to a Target Population. The American Statistician, 78(3), 280–289. doi:10.1080/00031305.2023.2267598

Källberg, D., & Waernbaum, I. (2023). Large Sample Properties of Entropy Balancing Estimators of Average Causal Effects. Econometrics and Statistics. doi:10.1016/j.ecosta.2023.11.004

Wang, Y., & Zubizarreta, J. R. (2020). Minimal dispersion approximately balancing weights: Asymptotic properties and practical considerations. Biometrika, 107(1), 93–105. doi:10.1093/biomet/asz050

Zubizarreta, J. R. (2015). Stable Weights that Balance Covariates for Estimation With Incomplete Outcome Data. Journal of the American Statistical Association, 110(511), 910–922. doi:10.1080/01621459.2015.1023805

See also

optweight.fit(), the lower-level function that performs the fitting. Links on that page can help with diagnosing and fixing more subtle issues with the optimization.

sbw, which was the inspiration for this package and provides some additional functionality for binary treatments.

WeightIt, which provides a simplified interface to optweight() and a more efficient implementation of entropy balancing.

Examples

library("cobalt")
#>  cobalt (Version 4.6.1, Build Date: 2025-08-20)
data("lalonde", package = "cobalt")

# Balancing covariates between treatment groups (binary)
(ow1 <- optweight(treat ~ age + educ + married +
                    nodegree + re74, data = lalonde,
                  tols = c(.01, .02, .03, .04, .05),
                  estimand = "ATE"))
#> An optweight object
#>  - number of obs.: 614
#>  - norm minimized: "l2"
#>  - sampling weights: present
#>  - base weights: present
#>  - treatment: 2-category
#>  - estimand: ATE
#>  - covariates: age, educ, married, nodegree, re74
bal.tab(ow1)
#> Balance Measures
#>             Type Diff.Adj
#> age      Contin.    -0.00
#> educ     Contin.     0.02
#> married   Binary    -0.03
#> nodegree  Binary     0.04
#> re74     Contin.    -0.05
#> 
#> Effective sample sizes
#>            Control Treated
#> Unadjusted   429.    185. 
#> Adjusted     415.3   125.3

# Exactly alancing covariates with respect to race (multi-category)
(ow2 <- optweight(race ~ age + educ + married +
                    nodegree + re74, data = lalonde,
                  tols = 0, estimand = "ATT",
                  focal = "black"))
#> An optweight object
#>  - number of obs.: 614
#>  - norm minimized: "l2"
#>  - sampling weights: present
#>  - base weights: present
#>  - treatment: 3-category (black, hispan, white)
#>  - estimand: ATT (focal: black)
#>  - covariates: age, educ, married, nodegree, re74
bal.tab(ow2)
#> Balance summary across all treatment pairs
#>             Type Max.Diff.Adj
#> age      Contin.            0
#> educ     Contin.            0
#> married   Binary            0
#> nodegree  Binary            0
#> re74     Contin.            0
#> 
#> Effective sample sizes
#>            hispan  white black
#> Unadjusted  72.   299.     243
#> Adjusted    45.96 181.39   243

# Balancing covariates between treatment groups (binary)
# and requesting a specified target population
targets <- process_targets(~ age + educ + married +
                             nodegree + re74,
                           data = lalonde,
                           targets = c(26, 12, .4, .5,
                                       1000))

(ow3a <- optweight(treat ~ age + educ + married +
                     nodegree + re74, data = lalonde,
                   targets = targets,
                   estimand = NULL))
#> An optweight object
#>  - number of obs.: 614
#>  - norm minimized: "l2"
#>  - sampling weights: present
#>  - base weights: present
#>  - treatment: 2-category
#>  - estimand: targets
#>  - covariates: age, educ, married, nodegree, re74

bal.tab(ow3a, disp.means = TRUE)
#> Note: `s.d.denom` not specified; assuming "pooled".
#> Balance Measures
#>             Type M.0.Adj M.1.Adj Diff.Adj
#> age      Contin.    26.0    26.0       -0
#> educ     Contin.    12.0    12.0       -0
#> married   Binary     0.4     0.4       -0
#> nodegree  Binary     0.5     0.5       -0
#> re74     Contin.  1000.0  1000.0       -0
#> 
#> Effective sample sizes
#>            Control Treated
#> Unadjusted  429.    185.  
#> Adjusted    158.04   64.09

# Balancing covariates between treatment groups (binary)
# and not requesting a target population
(ow3b <- optweight(treat ~ age + educ + married +
                     nodegree + re74, data = lalonde,
                   targets = NULL,
                   estimand = NULL))
#> An optweight object
#>  - number of obs.: 614
#>  - norm minimized: "l2"
#>  - sampling weights: present
#>  - base weights: present
#>  - treatment: 2-category
#>  - estimand: targets
#>  - covariates: age, educ, married, nodegree, re74

bal.tab(ow3b, disp.means = TRUE)
#> Note: `s.d.denom` not specified; assuming "pooled".
#> Balance Measures
#>             Type   M.0.Adj   M.1.Adj Diff.Adj
#> age      Contin.   26.4160   26.4160       -0
#> educ     Contin.   10.3547   10.3547       -0
#> married   Binary    0.3615    0.3615       -0
#> nodegree  Binary    0.6305    0.6305       -0
#> re74     Contin. 3908.9059 3908.9059       -0
#> 
#> Effective sample sizes
#>            Control Treated
#> Unadjusted  429.    185.  
#> Adjusted    382.74  139.23

# Balancing two treatments
(ow4 <- optweightMV(list(treat ~ age + educ + race + re74,
                         re75 ~ age + educ + race + re74),
                    data = lalonde))
#> An optweightMV object
#>  - number of obs.: 614
#>  - norm minimized: "l2"
#>  - sampling weights: present
#>  - base weights: present
#>  - number of treatments: 2
#>     treat: 2-category
#>     re75: continuous
#>  - covariates: 
#>     + for treat: age, educ, race, re74
#>     + for re75: age, educ, race, re74

summary(ow4)
#> Summary of weights:
#> 
#>  - - - - - - - - - - Treatment 1 - - - - - - - - - -
#> - Weight ranges:
#>         Min                                  Max
#> treated   0 |---------------------------| 8.7857
#> control   0 |--------------------|        6.7036
#> 
#> - Units with 5 greatest weights by group:
#>                                            
#>             179    166    162    124     23
#>  treated 5.6823 5.9135 6.3998 6.8191 8.7857
#>             300     48     26     19     15
#>  control 3.8543 3.9551 3.9906 5.4178 6.7036
#> 
#>            L2    L1    L∞ Rel Ent # Zeros
#> treated 1.783 1.35  7.786   1.276       0
#> control 0.955 0.743 5.704   0.477       0
#> 
#> - Effective Sample Sizes:
#>            Control Treated
#> Unweighted  429.    185.  
#> Weighted    224.31   44.26
#> 
#>  - - - - - - - - - - Treatment 2 - - - - - - - - - -
#> - Weight ranges:
#>     Min                                  Max
#> all   0 |---------------------------| 8.7857
#> 
#> - Units with 5 greatest weights by group:
#>                                        
#>         200    179    166    124     23
#>  all 5.9135 6.3998 6.7036 6.8191 8.7857
#> 
#>        L2    L1    L∞ Rel Ent # Zeros
#> all 1.263 0.926 7.786   0.718       0
#> 
#> - Effective Sample Sizes:
#>             Total
#> Unweighted 614.  
#> Weighted   236.54
#> 

bal.tab(ow4)
#> Balance by Time Point
#> 
#>  - - - Time: 1 - - - 
#> Balance Measures
#>                Type Diff.Adj
#> age         Contin.       -0
#> educ        Contin.       -0
#> race_black   Binary       -0
#> race_hispan  Binary        0
#> race_white   Binary       -0
#> re74        Contin.       -0
#> 
#> Effective sample sizes
#>            Control Treated
#> Unadjusted  429.    185.  
#> Adjusted    224.31   44.26
#> 
#>  - - - Time: 2 - - - 
#> Balance Measures
#>                Type Corr.Adj Diff.Target.Adj
#> age         Contin.       -0               0
#> educ        Contin.       -0               0
#> race_black   Binary       -0               0
#> race_hispan  Binary       -0              -0
#> race_white   Binary       -0              -0
#> re74        Contin.       -0               0
#> 
#> Effective sample sizes
#>             Total
#> Unadjusted 614.  
#> Adjusted   236.54
#>  - - - - - - - - - - - 
#> 

# Using a different norm
(ow1b <- optweight(treat ~ age + educ + married +
                    nodegree + re74, data = lalonde,
                  tols = c(.01, .02, .03, .04, .05),
                  estimand = "ATE",
                  norm = "l1"))
#> An optweight object
#>  - number of obs.: 614
#>  - norm minimized: "l1"
#>  - sampling weights: present
#>  - base weights: present
#>  - treatment: 2-category
#>  - estimand: ATE
#>  - covariates: age, educ, married, nodegree, re74

summary(ow1b, weight.range = FALSE)
#> Summary of weights:
#> 
#>           L2    L1     L∞ Rel Ent # Zeros
#> treated 2.16 0.422 26.361   0.664       0
#> control 1.04 0.165 14.038   0.228       0
#> 
#> - Effective Sample Sizes:
#>            Control Treated
#> Unweighted  429.    185.  
#> Weighted    206.18   32.65
#> 
summary(ow1, weight.range = FALSE)
#> Summary of weights:
#> 
#>            L2    L1    L∞ Rel Ent # Zeros
#> treated 0.69  0.536 3.419   0.198       0
#> control 0.182 0.165 0.409   0.017       0
#> 
#> - Effective Sample Sizes:
#>            Control Treated
#> Unweighted   429.    185. 
#> Weighted     415.3   125.3
#> 

# Allowing for negative weights
ow5 <- optweight(treat ~ age + educ + married + race +
                   nodegree + re74 + re75,
                 data = lalonde,
                 estimand = "ATE",
                 min.w = -Inf)

summary(ow5)
#> Summary of weights:
#> 
#> - Weight ranges:
#>             Min                                  Max
#> treated -0.9868 |---------------------------| 7.2545
#> control  0.4069     |-----|                   2.1701
#> 
#> - Units with 5 greatest weights by group:
#>                                            
#>             137    124     68     23     10
#>  treated 5.1933 5.2061  6.116 6.2053 7.2545
#>             388    375    226    196    118
#>  control  2.109 2.1096 2.1111  2.133 2.1701
#> 
#>            L2    L1    L∞ # Zeros
#> treated 1.608 1.216 6.254       0
#> control 0.499 0.39  1.17        0
#> 
#> - Effective Sample Sizes:
#>            Control Treated
#> Unweighted  429.    185.  
#> Weighted    343.49   51.57
#>