Trims (i.e., truncates) large weights by setting all weights higher than that at a given quantile to the weight at the quantile. This can be useful in controlling extreme weights, which can reduce effective sample size by enlarging the variability of the weights.

# S3 method for weightit
trim(w, at = 0, lower = FALSE, ...)

# S3 method for numeric
trim(w, at = 0, lower = FALSE, treat = NULL, ...)

## Arguments

w

A weightit object or a vector of weights.

at

numeric; either the quantile of the weights above which weights are to be trimmed. A single number between .5 and 1, or the number of weights to be trimmed (e.g., at = 3 for the top 3 weights to be set to the 4th largest weight).

lower

logical; whether also to trim at the lower quantile (e.g., for at = .9, trimming at both .1 and .9, or for at = 3, trimming the top and bottom 3 weights).

treat

A vector of treatment status for each unit. This should always be included when w is numeric, but you can get away with leaving it out if the treatment is continuous or the estimand is the ATE for binary or multinomial treatments.

...

Not used.

## Details

trim() takes in a weightit object (the output of a call to weightit() or weightitMSM()) or a numeric vector of weights and trims them to the specified quantile. All weights above that quantile are set to the weight at that quantile. If lower = TRUE, all weights below 1 minus the quantile are to set the weight at 1 minus the quantile. In general, trimming weights decreases balance but also decreases the variability of the weights, improving precision at the potential expense of unbiasedness (Cole & Hernán, 2008). See Lee, Lessler, and Stuart (2011) and Thoemmes and Ong (2015) for discussions and simulation results of trimming weights at various quantiles. Note that trimming weights can also change the target population and therefore the estimand.

When using trim() on a numeric vector of weights, it is helpful to include the treatment vector as well. The helps determine the type of treatment and estimand, which are used to specify how trimming is performed. In particular, if the estimand is determined to be the ATT or ATC, the weights of the target (i.e., focal) group are ignored, since they should all be equal to 1. Otherwise, if the estimand is the ATE or the treatment is continuous, all weights are considered for trimming. In general, weights for any group for which all the weights are the same will not be considered in the trimming.

## Value

If the input is a weightit object, the output will be a weightit object with the weights replaced by the trimmed weights, while will have an additional attribute, "trim", equal to the quantile of trimming.

If the input is a numeric vector of weights, the output will be a numeric vector of the trimmed weights, again with the aforementioned attribute.

## References

Cole, S. R., & Hernán, M. Á. (2008). Constructing Inverse Probability Weights for Marginal Structural Models. American Journal of Epidemiology, 168(6), 656–664.

Lee, B. K., Lessler, J., & Stuart, E. A. (2011). Weight Trimming and Propensity Score Weighting. PLoS ONE, 6(3), e18174.

Thoemmes, F., & Ong, A. D. (2016). A Primer on Inverse Probability of Treatment Weighting and Marginal Structural Models. Emerging Adulthood, 4(1), 40–59.

## Author

Noah Greifer

weightit(), weightitMSM()

## Examples

library("cobalt")
data("lalonde", package = "cobalt")

(W <- weightit(treat ~ age + educ + married +
nodegree + re74, data = lalonde,
method = "ps", estimand = "ATT"))
#> A weightit object
#>  - method: "ps" (propensity score weighting)
#>  - number of obs.: 614
#>  - sampling weights: none
#>  - treatment: 2-category
#>  - estimand: ATT (focal: 1)
#>  - covariates: age, educ, married, nodegree, re74
summary(W)
#>                  Summary of weights
#>
#> - Weight ranges:
#>
#>            Min                                  Max
#> treated 1.0000               ||              1.0000
#> control 0.0222 |---------------------------| 2.0438
#>
#> - Units with 5 most extreme weights by group:
#>
#>              10      8      4      3      1
#>  treated      1      1      1      1      1
#>             411    595    269    409    296
#>  control 1.3303 1.4365 1.5005 1.6369 2.0438
#>
#> - Weight statistics:
#>
#>         Coef of Var   MAD Entropy # Zeros
#> treated       0.000 0.000   -0.00       0
#> control       0.823 0.701    0.33       0
#>
#> - Effective Sample Sizes:
#>
#>            Control Treated
#> Unweighted  429.       185
#> Weighted    255.99     185

#Trimming the top and bottom 5 weights
trim(W, at = 5, lower = TRUE)
#> Trimming the top and bottom 5 weights where treat is not 1.
#> A weightit object
#>  - method: "ps" (propensity score weighting)
#>  - number of obs.: 614
#>  - sampling weights: none
#>  - treatment: 2-category
#>  - estimand: ATT (focal: 1)
#>  - covariates: age, educ, married, nodegree, re74
#>  - weights trimmed at the top and bottom 5

#Trimming at 90th percentile
(W.trim <- trim(W, at = .9))
#> Trimming weights where treat is not 1 to 90%.
#> A weightit object
#>  - method: "ps" (propensity score weighting)
#>  - number of obs.: 614
#>  - sampling weights: none
#>  - treatment: 2-category
#>  - estimand: ATT (focal: 1)
#>  - covariates: age, educ, married, nodegree, re74
#>  - weights trimmed at 90%

summary(W.trim)
#>                  Summary of weights
#>
#> - Weight ranges:
#>
#>            Min                                    Max
#> treated 1.0000                              || 1.0000
#> control 0.0222   |-------------------------|   0.9407
#>
#> - Units with 5 most extreme weights by group:
#>
#>              10      8      4      3      1
#>  treated      1      1      1      1      1
#>             303    296    285    269    264
#>  control 0.9407 0.9407 0.9407 0.9407 0.9407
#>
#> - Weight statistics:
#>
#>         Coef of Var   MAD Entropy # Zeros
#> treated       0.000 0.000  -0.000       0
#> control       0.766 0.682   0.303       0
#>
#> - Effective Sample Sizes:
#>
#>            Control Treated
#> Unweighted  429.       185
#> Weighted    270.58     185
#Note that only the control weights were trimmed

#Trimming a numeric vector of weights
all.equal(trim(W$weights, at = .9, treat = lalonde$treat),
W.trim\$weights)
#> Trimming weights where treat is not 1 to 90%.
#> [1] TRUE

#Using made up data and as.weightit()
treat <- rbinom(500, 1, .3)
weights <- rchisq(500, df = 2)
W <- as.weightit(weights = weights, treat = treat,
estimand = "ATE")
summary(W)
#>                  Summary of weights
#>
#> - Weight ranges:
#>
#>            Min                                   Max
#> treated 0.0782 |-----------------|            7.3680
#> control 0.0030 |---------------------------| 11.3178
#>
#> - Units with 5 most extreme weights by group:
#>
#>             203    333    436    335     103
#>  treated 5.5532 6.1825 6.2122 7.1462   7.368
#>             408    337    201    196      45
#>  control 8.8746 8.9234 9.7306 9.8349 11.3178
#>
#> - Weight statistics:
#>
#>         Coef of Var   MAD Entropy # Zeros
#> treated       0.808 0.640   0.309       0
#> control       0.929 0.714   0.391       0
#>
#> - Effective Sample Sizes:
#>
#>            Control Treated
#> Unweighted  370.    130.
#> Weighted    198.76   78.93
summary(trim(W, at = .95))
#> Trimming weights to 95%.
#>                  Summary of weights
#>
#> - Weight ranges:
#>
#>            Min                                  Max
#> treated 0.0782 |---------------------------| 6.1935
#> control 0.0030 |---------------------------| 6.1935
#>
#> - Units with 5 most extreme weights by group:
#>
#>             203    333    436    335    103
#>  treated 5.5532 6.1825 6.1935 6.1935 6.1935
#>             152    131    115    114     45
#>  control 6.1935 6.1935 6.1935 6.1935 6.1935
#>
#> - Weight statistics:
#>
#>         Coef of Var   MAD Entropy # Zeros
#> treated       0.790 0.635   0.301       0
#> control       0.848 0.692   0.354       0
#>
#> - Effective Sample Sizes:
#>
#>            Control Treated
#> Unweighted  370.    130.
#> Weighted    215.45   80.28