Trims (i.e., winsorizes) large weights by setting all weights higher than
that at a given quantile to the weight at the quantile or to 0. This can be useful
in controlling extreme weights, which can reduce effective sample size by
enlarging the variability of the weights. Note that by default, no observations are
fully discarded when using trim()
, which may differ from the some
uses of the word "trim" (see the drop
argument below).
Usage
trim(x, ...)
# S3 method for class 'weightit'
trim(x, at = 0, lower = FALSE, drop = FALSE, ...)
# Default S3 method
trim(x, at = 0, lower = FALSE, treat = NULL, drop = FALSE, ...)
Arguments
- x
A
weightit
object or a vector of weights.- ...
Not used.
- at
numeric
; either the quantile of the weights above which weights are to be trimmed. A single number between .5 and 1, or the number of weights to be trimmed (e.g.,at = 3
for the top 3 weights to be set to the 4th largest weight).- lower
logical
; whether also to trim at the lower quantile (e.g., forat = .9
, trimming at both .1 and .9, or forat = 3
, trimming the top and bottom 3 weights). Default isFALSE
to only trim the higher weights.- drop
logical
; whether to set the weights of the trimmed units to 0 or not. Default isFALSE
to retain all trimmed units. Setting toTRUE
may change the original targeted estimand when not the ATT or ATC.- treat
A vector of treatment status for each unit. This should always be included when
x
is numeric, but you can get away with leaving it out if the treatment is continuous or the estimand is the ATE for binary or multi-category treatments.
Value
If the input is a weightit
object, the output will be a
weightit
object with the weights replaced by the trimmed weights (or 0) and
will have an additional attribute, "trim"
, equal to the quantile of
trimming.
If the input is a numeric vector of weights, the output will be a numeric vector of the trimmed weights, again with the aforementioned attribute.
Details
trim()
takes in a weightit
object (the output of a call to
weightit()
or weightitMSM()
) or a numeric vector of weights and trims
(winsorizes) them to the specified quantile. All weights above that quantile
are set to the weight at that quantile unless drop = TRUE
, in which case they are set to 0. If lower = TRUE
, all weights
below 1 minus the quantile are trimmed. In
general, trimming weights can decrease balance but also decreases the
variability of the weights, improving precision at the potential expense of
unbiasedness (Cole & Hernán, 2008). See Lee, Lessler, and Stuart (2011) and
Thoemmes and Ong (2015) for discussions and simulation results of trimming
weights at various quantiles. Note that trimming weights can also change the
target population and therefore the estimand.
When using trim()
on a numeric vector of weights, it is helpful to
include the treatment vector as well. The helps determine the type of
treatment and estimand, which are used to specify how trimming is performed.
In particular, if the estimand is determined to be the ATT or ATC, the
weights of the target (i.e., focal) group are ignored, since they should all
be equal to 1. Otherwise, if the estimand is the ATE or the treatment is
continuous, all weights are considered for trimming. In general, weights for
any group for which all the weights are the same will not be considered in
the trimming.
References
Cole, S. R., & Hernán, M. Á. (2008). Constructing Inverse Probability Weights for Marginal Structural Models. American Journal of Epidemiology, 168(6), 656–664.
Lee, B. K., Lessler, J., & Stuart, E. A. (2011). Weight Trimming and Propensity Score Weighting. PLoS ONE, 6(3), e18174.
Thoemmes, F., & Ong, A. D. (2016). A Primer on Inverse Probability of Treatment Weighting and Marginal Structural Models. Emerging Adulthood, 4(1), 40–59.
Examples
library("cobalt")
data("lalonde", package = "cobalt")
(W <- weightit(treat ~ age + educ + married +
nodegree + re74, data = lalonde,
method = "glm", estimand = "ATT"))
#> A weightit object
#> - method: "glm" (propensity score weighting with GLM)
#> - number of obs.: 614
#> - sampling weights: none
#> - treatment: 2-category
#> - estimand: ATT (focal: 1)
#> - covariates: age, educ, married, nodegree, re74
summary(W)
#> Summary of weights
#>
#> - Weight ranges:
#>
#> Min Max
#> treated 1.0000 || 1.0000
#> control 0.0222 |---------------------------| 2.0438
#>
#> - Units with the 5 most extreme weights by group:
#>
#> 5 4 3 2 1
#> treated 1 1 1 1 1
#> 411 595 269 409 296
#> control 1.3303 1.4365 1.5005 1.6369 2.0438
#>
#> - Weight statistics:
#>
#> Coef of Var MAD Entropy # Zeros
#> treated 0.000 0.000 0.00 0
#> control 0.823 0.701 0.33 0
#>
#> - Effective Sample Sizes:
#>
#> Control Treated
#> Unweighted 429. 185
#> Weighted 255.99 185
#Trimming the top and bottom 5 weights
trim(W, at = 5, lower = TRUE)
#> Trimming the top and bottom 5 weights where treat is not 1.
#> A weightit object
#> - method: "glm" (propensity score weighting with GLM)
#> - number of obs.: 614
#> - sampling weights: none
#> - treatment: 2-category
#> - estimand: ATT (focal: 1)
#> - covariates: age, educ, married, nodegree, re74
#> - weights trimmed at the top and bottom 5
#Trimming at 90th percentile
(W.trim <- trim(W, at = .9))
#> Trimming weights where treat is not 1 to 90%.
#> A weightit object
#> - method: "glm" (propensity score weighting with GLM)
#> - number of obs.: 614
#> - sampling weights: none
#> - treatment: 2-category
#> - estimand: ATT (focal: 1)
#> - covariates: age, educ, married, nodegree, re74
#> - weights trimmed at 90%
summary(W.trim)
#> Summary of weights
#>
#> - Weight ranges:
#>
#> Min Max
#> treated 1.0000 || 1.0000
#> control 0.0222 |-------------------------| 0.9407
#>
#> - Units with the 5 most extreme weights by group:
#>
#> 5 4 3 2 1
#> treated 1 1 1 1 1
#> 303 296 285 269 264
#> control 0.9407 0.9407 0.9407 0.9407 0.9407
#>
#> - Weight statistics:
#>
#> Coef of Var MAD Entropy # Zeros
#> treated 0.000 0.000 0.000 0
#> control 0.766 0.682 0.303 0
#>
#> - Effective Sample Sizes:
#>
#> Control Treated
#> Unweighted 429. 185
#> Weighted 270.58 185
#Note that only the control weights were trimmed
#Trimming a numeric vector of weights
all.equal(trim(W$weights, at = .9, treat = lalonde$treat),
W.trim$weights)
#> Trimming weights where treat is not 1 to 90%.
#> [1] TRUE
#Dropping trimmed units
(W.trim <- trim(W, at = .9, drop = TRUE))
#> Setting weights beyond 90% where treat is not 1 to 0.
#> A weightit object
#> - method: "glm" (propensity score weighting with GLM)
#> - number of obs.: 614
#> - sampling weights: none
#> - treatment: 2-category
#> - estimand: ATT (focal: 1)
#> - covariates: age, educ, married, nodegree, re74
#> - weights trimmed at 90%
summary(W.trim)
#> Summary of weights
#>
#> - Weight ranges:
#>
#> Min Max
#> treated 1.0000 || 1.0000
#> control 0.0222 |-------------------------| 0.9407
#>
#> - Units with the 5 most extreme weights by group:
#>
#> 5 4 3 2 1
#> treated 1 1 1 1 1
#> 467 466 373 369 356
#> control 0.9407 0.9407 0.9407 0.9407 0.9407
#>
#> - Weight statistics:
#>
#> Coef of Var MAD Entropy # Zeros
#> treated 0.000 0.000 0.000 0
#> control 0.881 0.757 0.303 40
#>
#> - Effective Sample Sizes:
#>
#> Control Treated
#> Unweighted 429. 185
#> Weighted 241.72 185
#Note that we now have zeros in the control group
#Using made up data and as.weightit()
treat <- rbinom(500, 1, .3)
weights <- rchisq(500, df = 2)
W <- as.weightit(weights, treat = treat,
estimand = "ATE")
summary(W)
#> Summary of weights
#>
#> - Weight ranges:
#>
#> Min Max
#> treated 0.0266 |----------------------| 10.6020
#> control 0.0069 |---------------------------| 12.5642
#>
#> - Units with the 5 most extreme weights by group:
#>
#> 422 391 289 188 292
#> treated 7.4211 7.6362 7.6398 8.5277 10.602
#> 92 41 171 363 4
#> control 8.6373 9.2475 9.8308 11.2619 12.5642
#>
#> - Weight statistics:
#>
#> Coef of Var MAD Entropy # Zeros
#> treated 0.924 0.717 0.379 0
#> control 0.971 0.731 0.404 0
#>
#> - Effective Sample Sizes:
#>
#> Control Treated
#> Unweighted 343. 157.
#> Weighted 176.72 84.93
summary(trim(W, at = .95))
#> Trimming weights to 95%.
#> Summary of weights
#>
#> - Weight ranges:
#>
#> Min Max
#> treated 0.0266 |---------------------------| 6.2426
#> control 0.0069 |---------------------------| 6.2426
#>
#> - Units with the 5 most extreme weights by group:
#>
#> 206 202 188 105 9
#> treated 6.2426 6.2426 6.2426 6.2426 6.2426
#> 112 92 41 22 4
#> control 6.2426 6.2426 6.2426 6.2426 6.2426
#>
#> - Weight statistics:
#>
#> Coef of Var MAD Entropy # Zeros
#> treated 0.852 0.696 0.345 0
#> control 0.871 0.707 0.362 0
#>
#> - Effective Sample Sizes:
#>
#> Control Treated
#> Unweighted 343. 157.
#> Weighted 195.24 91.25