Skip to contents

Estimate targeting weights for covariates specified in formula. The target means are specified with targets and the maximum distance between each weighted covariate mean and the corresponding target mean is specified by tols. See Zubizarreta (2015) for details of the properties of the weights and the methods used to fit them.

Usage

optweight.svy(
  formula,
  data = NULL,
  tols = 0,
  targets = NULL,
  s.weights = NULL,
  b.weights = NULL,
  norm = "l2",
  verbose = FALSE,
  ...
)

Arguments

formula

a formula with nothing on the left hand side and the covariates to be targeted on the right hand side. See glm() for more details. Interactions and functions of covariates are allowed.

data

An optional data set in the form of a data frame that contains the variables in formula.

tols

a vector of target balance tolerance values for each covariate. The resulting weighted covariate means will be no further away from the targets than the specified values. If only one value is supplied, it will be applied to all covariates. Can also be the output of a call to process_tols().

targets

A vector of target population mean values for each baseline covariate. The resulting weights will yield sample means within tols/2 units of the target values for each covariate. If NULL or all NA, estimand will be used to determine targets. Otherwise, estimand is ignored. If any target values are NA, the corresponding variable will not be targeted and its weighted mean will be wherever the weights yield the smallest variance. Can also be the output of a call to process_targets(). See Details.

s.weights

A vector of sampling weights or the name of a variable in data that contains sampling weights.

b.weights

A vector of base weights or the name of a variable in data that contains base weights. If supplied, the desired norm of the distance between the estimated weights and the base weights is minimized.

norm

character; a string containing the name of the norm corresponding to the objective function to minimize. Allowable options include "l1" for the L1 norm, "l2" for the L2 norm (the default), "linf" for the L\(\infty\) norm, "entropy" for the negative entropy, and "log" for the sum of the logs. See optweight.fit() for details.

verbose

logical; whether information on the optimization problem solution should be printed. Default is FALSE.

...

Arguments passed on to optweight.svy.fit

std.binary,std.cont

logical; whether the tolerances are in standardized mean units (TRUE) or raw units (FALSE) for binary variables and continuous variables, respectively. The default is FALSE for std.binary because raw proportion differences make more sense than standardized mean difference for binary variables. These arguments are analogous to the binary and continuous arguments in bal.tab() in cobalt.

min.w

numeric; a single value less than 1 for the smallest allowable weight. Some analyses require nonzero weights for all units, so a small, nonzero minimum may be desirable. The default is 1e-8 (\(10^{-8}\)), which does not materially change the properties of the weights from a minimum of 0 but prevents warnings in some packages that use weights to estimate treatment effects. When norm is "entropy" or "log" and min.w <= 0, min.w will be set to the smallest nonzero value.

Value

An optweight.svy object with the following elements:

weights

The estimated weights, one for each unit.

covs

The covariates used in the fitting. Only includes the raw covariates, which may have been altered in the fitting process.

s.weights

The provided sampling weights.

call

The function call.

tols

The tolerance values for each covariate.

duals

A data.frame containing the dual variables for each covariate. See optweight() for interpretation of these values.

info

Information about the performance of the optimization at termination.

Details

The optimization is performed by the lower-level function optweight.svy.fit().

Weights are estimated so that the standardized differences between the weighted covariate means and the corresponding targets are within the given tolerance thresholds (unless std.binary or std.cont are FALSE, in which case unstandardized mean differences are considered for binary and continuous variables, respectively). For a covariate \(x\) with specified tolerance \(\delta\), the weighted mean will be within \(\delta\) of the target. If standardized tolerance values are requested, the standardization factor is the standard deviation of the covariate in the whole sample. The standardization factor is always unweighted.

See the optweight() help page for information on interpreting dual variables and solving convergence failure.

References

Zubizarreta, J. R. (2015). Stable Weights that Balance Covariates for Estimation With Incomplete Outcome Data. Journal of the American Statistical Association, 110(511), 910–922. doi:10.1080/01621459.2015.1023805

See also

optweight.svy.fit(), the lower-level function that performs the fitting.

optweight.fit() for more details about the optimization options.

optweight() for estimating weights that balance treatment groups.

Examples

library("cobalt")
data("lalonde", package = "cobalt")

cov.formula <- ~ age + educ + race + married + nodegree

targets <- process_targets(cov.formula, data = lalonde,
                           targets = c(23, 9, .3, .3, .4,
                                       .2, .5))

ows <- optweight.svy(cov.formula,
                     data = lalonde,
                     tols = 0,
                     targets = targets)
ows
#> An optweight.svy object
#>  - number of obs.: 614
#>  - norm minimized: "l2"
#>  - sampling weights: present
#>  - base weights: present
#>  - covariates: age, educ, race, married, nodegree

#Unweighted means
col_w_mean(ows$covs)
#>         age        educ  race_black race_hispan  race_white     married 
#>  27.3631922  10.2687296   0.3957655   0.1172638   0.4869707   0.4153094 
#>    nodegree 
#>   0.6302932 

#Weighted means; same as targets
col_w_mean(ows$covs, w = ows$weights)
#>         age        educ  race_black race_hispan  race_white     married 
#>        23.0         9.0         0.3         0.3         0.4         0.2 
#>    nodegree 
#>         0.5