Skip to contents

optweight.svy.fit() performs the optimization (via osqp) for optweight.svy() and should, in most cases, not be used directly. Little processing of inputs is performed, so they must be given exactly as described below.

Usage

optweight.svy.fit(
  covs,
  tols = 0,
  targets,
  s.weights = NULL,
  b.weights = NULL,
  norm = "l2",
  std.binary = FALSE,
  std.cont = TRUE,
  min.w = 1e-08,
  verbose = FALSE,
  ...
)

Arguments

covs

A matrix of covariates to be targeted. Should must be numeric but does not have to be full rank.

tols

A vector of target balance tolerance values.

targets

A vector of target population mean values for each covariate. The resulting weights will yield sample means within tols units of the target values for each covariate. If any target values are NA, the corresponding variable will not be targeted and its weighted mean will be wherever the weights yield the smallest variance. To ensure the weighted mean for a covariate is equal to its unweighted mean (i.e., so that its original mean is its target mean), its original mean must be supplied as a target.

s.weights

A vector of sampling weights. Optimization occurs on the product of the sampling weights and the estimated weights.

b.weights

A vector of base weights. Default is a vector of 1s. The desired norm of the distance between the estimated weights and the base weights is minimized.

norm

A string containing the name of the norm corresponding to the objective function to minimize. The options are "l1" for the L1 norm, "l2" for the L2 norm (the default), and "linf" for the L\(\infty\) norm. The L1 norm minimizes the average absolute distance between each weight and the base weights; the L2 norm minimizes the average squared distance between each weight and the base weights; the L\(\infty\) norm minimizes the largest absolute distance between each weight and the base weights. The L2 norm has a direct correspondence with the effective sample size, making it ideal if this is your criterion of interest.

std.binary, std.cont

logical; whether the tolerances are in standardized mean units (TRUE) or raw units (FALSE) for binary variables and continuous variables, respectively. The default is FALSE for std.binary because raw proportion differences make more sense than standardized mean difference for binary variables. These arguments are analogous to the binary and continuous arguments in bal.tab() in cobalt.

min.w

A single numeric value less than 1 for the smallest allowable weight. Some analyses require nonzero weights for all units, so a small, nonzero minimum may be desirable. Doing so will likely (slightly) increase the variance of the resulting weights depending on the magnitude of the minimum. The default is 1e-8, which does not materially change the properties of the weights from a minimum of 0 but prevents warnings in some packages that use weights to estimate treatment effects.

verbose

Whether information on the optimization problem solution should be printed. This information contains how many iterations it took to estimate the weights and whether the solution is optimal.

...

Options that are passed to osqp::osqpSettings() for use in the par arguments of osqp::solve_osqp(). See Details for defaults.

Value

An optweight.svy.fit object with the following elements:

w

The estimated weights, one for each unit.

duals

A data.frame containing the dual variables for each covariate. See Zubizarreta (2015) for interpretation of these values.

info

The info component of the output of osqp::solve_osqp(), which contains information on the performance of the optimization at termination.

Details

optweight.svy.fit() transforms the inputs into the required inputs for osqp::solve_osqp(), which are (sparse) matrices and vectors, and then supplies the outputs (the weights, duals variables, and convergence information) back to optweight.svy(). Little processing of inputs is performed, as this is normally handled by optweight.svy().

The default values for some of the parameters sent to osqp::solve_osqp() are not the same as those in osqp::osqpSettings(). The following are the differences: max_iter is set to 20000, eps_abs and eps_rel are set to 1e-8 (i.e., \(10^{-8}\)), and adaptive_rho_interval is set to 10. All other values are the same.

References

Wang, Y., & Zubizarreta, J. R. (2020). Minimal dispersion approximately balancing weights: Asymptotic properties and practical considerations. Biometrika, 107(1), 93–105. doi:10.1093/biomet/asz050

Yiu, S., & Su, L. (2018). Covariate association eliminating weights: a unified weighting framework for causal effect estimation. Biometrika. doi:10.1093/biomet/asy015

Zubizarreta, J. R. (2015). Stable Weights that Balance Covariates for Estimation With Incomplete Outcome Data. Journal of the American Statistical Association, 110(511), 910–922. doi:10.1080/01621459.2015.1023805

See also

optweight.svy() which you should use for estimating the balancing weights, unless you know better.

The OSQP docs for more information on osqp, the underlying solver, and the options for osqp::solve_osqp(). osqp::osqpSettings() for details on options for solve_osqp().

Examples

library("cobalt")
data("lalonde", package = "cobalt")

covs <- splitfactor(lalonde[c("age", "educ", "race",
                  "married", "nodegree")],
                  drop.first = FALSE)

targets <- c(23, 9, .3, .3, .4, .2, .5)

tols <- rep(0, 7)

ows.fit <- optweight.svy.fit(covs,
                             tols = tols,
                             targets = targets,
                             norm = "l2")

#Unweighted means
col_w_mean(covs)
#>         age        educ  race_black race_hispan  race_white     married 
#>  27.3631922  10.2687296   0.3957655   0.1172638   0.4869707   0.4153094 
#>    nodegree 
#>   0.6302932 

#Weighted means; same as targets
col_w_mean(covs, w = ows.fit$w)
#>         age        educ  race_black race_hispan  race_white     married 
#>        23.0         9.0         0.3         0.3         0.4         0.2 
#>    nodegree 
#>         0.5