Estimate targeting weights for covariates specified in formula
. The
target means are specified with targets
and the maximum distance
between each weighted covariate mean and the corresponding target mean is
specified by tols
. See Zubizarreta (2015) for details of the
properties of the weights and the methods used to fit them.
Usage
optweight.svy(
formula,
data = NULL,
tols = 0,
targets = NULL,
s.weights = NULL,
b.weights = NULL,
verbose = FALSE,
...
)
Arguments
- formula
A formula with nothing on the left hand side and the covariates to be targeted on the right hand side. See
glm()
for more details. Interactions and functions of covariates are allowed.- data
An optional data set in the form of a data frame that contains the variables in
formula
.- tols
A vector of target balance tolerance values for each covariate. The resulting weighted covariate means will be no further away from the targets than the specified values. If only one value is supplied, it will be applied to all covariates. Can also be the output of a call to
check_tols()
.- targets
A vector of target population mean values for each covariate. The resulting weights will yield sample means within
tols
units of the target values for each covariate. If any target values areNA
, the corresponding variable will not be targeted and its weighted mean will be wherever the weights yield the smallest variance. To ensure the weighted mean for a covariate is equal to its unweighted mean (i.e., so that its original mean is its target mean), its original mean must be supplied as a target.- s.weights
A vector of sampling weights or the name of a variable in
data
that contains sampling weights. Optimization occurs on the product of the sampling weights and the estimated weights.- b.weights
A vector of base weights or the name of a variable in
data
that contains base weights. If supplied, the desired norm of the distance between the estimated weights and the base weights is minimized.- verbose
Whether information on the optimization problem solution should be printed. This information contains how many iterations it took to estimate the weights and whether the solution is optimal.
- ...
Arguments passed on to
optweight.svy.fit
norm
A string containing the name of the norm corresponding to the objective function to minimize. The options are
"l1"
for the L1 norm,"l2"
for the L2 norm (the default), and"linf"
for the L\(\infty\) norm. The L1 norm minimizes the average absolute distance between each weight and the base weights; the L2 norm minimizes the average squared distance between each weight and the base weights; the L\(\infty\) norm minimizes the largest absolute distance between each weight and the base weights. The L2 norm has a direct correspondence with the effective sample size, making it ideal if this is your criterion of interest.std.binary,std.cont
logical
; whether the tolerances are in standardized mean units (TRUE
) or raw units (FALSE
) for binary variables and continuous variables, respectively. The default isFALSE
forstd.binary
because raw proportion differences make more sense than standardized mean difference for binary variables. These arguments are analogous to thebinary
andcontinuous
arguments inbal.tab()
in cobalt.min.w
A single
numeric
value less than 1 for the smallest allowable weight. Some analyses require nonzero weights for all units, so a small, nonzero minimum may be desirable. Doing so will likely (slightly) increase the variance of the resulting weights depending on the magnitude of the minimum. The default is 1e-8, which does not materially change the properties of the weights from a minimum of 0 but prevents warnings in some packages that use weights to estimate treatment effects.
Value
An optweight.svy
object with the following elements:
- weights
The estimated weights, one for each unit.
- covs
The covariates used in the fitting. Only includes the raw covariates, which may have been altered in the fitting process.
- s.weights
The provided sampling weights.
- call
The function call.
- tols
The tolerance values for each covariate.
- duals
A data.frame containing the dual variables for each covariate. See Details for interpretation of these values.
- info
The
info
component of the output ofosqp::solve_osqp()
, which contains information on the performance of the optimization at termination.
Details
The optimization is performed by the lower-level function
optweight.svy.fit()
using osqp::solve_osqp()
in the
osqp package, which provides a straightforward interface to specifying
the constraints and objective function for quadratic optimization problems
and uses a fast and flexible solving algorithm.
Weights are estimated so that the standardized differences between the
weighted covariate means and the corresponding targets are within the given
tolerance thresholds (unless std.binary
or std.cont
are
FALSE
, in which case unstandardized mean differences are considered
for binary and continuous variables, respectively). For a covariate \(x\)
with specified tolerance \(\delta\), the weighted mean will be within
\(\delta\) of the target. If standardized tolerance values are requested,
the standardization factor is the standard deviation of the covariate in the
whole sample. The standardization factor is always unweighted.
See the optweight()
help page for information on interpreting
dual variables and solving convergence failure.
References
Stellato, B., Banjac, G., Goulart, P., Boyd, S., & Bansal, V. (2024). osqp: Quadratic Programming Solver using the 'OSQP' Library R package version 0.6.3.3. doi:10.32614/CRAN.package.osqp
Zubizarreta, J. R. (2015). Stable Weights that Balance Covariates for Estimation With Incomplete Outcome Data. Journal of the American Statistical Association, 110(511), 910–922. doi:10.1080/01621459.2015.1023805
See also
The OSQP docs for more information on osqp, the underlying solver, and the options for osqp::solve_osqp()
. osqp::osqpSettings()
for details on options for solve_osqp()
.
optweight.svy.fit()
, the lower-level function that performs the fitting.
optweight()
for estimating weights that balance treatment groups.
Examples
library("cobalt")
data("lalonde", package = "cobalt")
cov.formula <- ~ age + educ + race + married +
nodegree
targets <- check_targets(cov.formula, data = lalonde,
targets = c(23, 9, .3, .3, .4,
.2, .5))
tols <- check_tols(cov.formula, data = lalonde,
tols = 0)
ows <- optweight.svy(cov.formula,
data = lalonde,
tols = tols,
targets = targets)
ows
#> An optweight.svy object
#> - number of obs.: 614
#> - sampling weights: none
#> - covariates: age, educ, race, married, nodegree
#Unweighted means
col_w_mean(ows$covs)
#> age educ race_black race_hispan race_white married
#> 27.3631922 10.2687296 0.3957655 0.1172638 0.4869707 0.4153094
#> nodegree
#> 0.6302932
#Weighted means; same as targets
col_w_mean(ows$covs, w = ows$weights)
#> age educ race_black race_hispan race_white married
#> 23.0 9.0 0.3 0.3 0.4 0.2
#> nodegree
#> 0.5