weightitMSM() allows for the easy generation of balancing weights for marginal structural models for time-varying treatments using a variety of available methods for binary, continuous, and multinomial treatments. Many of these methods exist in other packages, which weightit() calls; these packages must be installed to use the desired method. Also included are print() and summary() methods for examining the output.

Currently only "wide" data sets, where each row corresponds to a unit's entire variable history, are supported. You can use reshape() or other functions to transform your data into this format; see example below.

## Usage

weightitMSM(formula.list,
data = NULL,
method = "glm",
stabilize = FALSE,
by = NULL,
s.weights = NULL,
num.formula = NULL,
moments = NULL,
int = FALSE,
missing = NULL,
verbose = FALSE,
include.obj = FALSE,
is.MSM.method,
weightit.force = FALSE,
...)

# S3 method for weightitMSM
print(x, ...)

## Arguments

formula.list

a list of formulas corresponding to each time point with the time-specific treatment variable on the left hand side and pre-treatment covariates to be balanced on the right hand side. The formulas must be in temporal order, and must contain all covariates to be balanced at that time point (i.e., treatments and covariates featured in early formulas should appear in later ones). Interactions and functions of covariates are allowed.

data

an optional data set in the form of a data frame that contains the variables in the formulas in formula.list. This must be a wide data set with exactly one row per unit.

method

a string of length 1 containing the name of the method that will be used to estimate weights. See weightit() for allowable options. The default is "glm", which estimates the weights using generalized linear models.

stabilize

logical; whether or not to stabilize the weights. Stabilizing the weights involves fitting a model predicting treatment at each time point from treatment status at prior time points. If TRUE, a fully saturated model will be fit (i.e., all interactions between all treatments up to each time point), essentially using the observed treatment probabilities in the numerator (for binary and multinomial treatments). This may yield an error if some combinations are not observed. Default is FALSE. To manually specify stabilization model formulas, e.g., to specify non-saturated models, use num.formula. With many time points, saturated models may be time-consuming or impossible to fit.

num.formula

optional; a one-sided formula with the stabilization factors (other than the previous treatments) on the right hand side, which adds, for each time point, the stabilization factors to a model saturated with previous treatments. See Cole & Hernán (2008) for a discussion of how to specify this model; including stabilization factors can change the estimand without proper adjustment, and should be done with caution. Can also be a list of one-sided formulas, one for each time point. Unless you know what you are doing, we recommend setting stabilize = TRUE and ignoring num.formula.

by

a string containing the name of the variable in data for which weighting is to be done within categories or a one-sided formula with the stratifying variable on the right-hand side. For example, if by = "gender" or by = ~ gender, weights will be generated separately within each level of the variable "gender". The argument used to be called exact, which will still work but with a message. Only one by variable is allowed.

s.weights

a vector of sampling weights or the name of a variable in data that contains sampling weights. These are ignored for some methods.

moments

numeric; for some methods, the greatest power of each covariate to be balanced. For example, if moments = 3, for each non-categorical covariate, the covariate, its square, and its cube will be balanced. This argument is ignored for other methods; to balance powers of the covariates, appropriate functions must be entered in formula. See the specific methods help pages for information on whether they accept moments.

int

logical; for some methods, whether first-order interactions of the covariates are to be balanced. This argument is ignored for other methods; to balance interactions between the variables, appropriate functions must be entered in formula. See the specific methods help pages for information on whether they accept int.

missing

character; how missing data should be handled. The options and defaults depend on the method used. Ignored if no missing data is present. It should be noted that multiple imputation outperforms all available missingness methods available in weightit() and should probably be used instead. See the MatchThem package for the use of weightit() with multiply imputed data.

verbose

whether to print additional information output by the fitting function.

include.obj

whether to include in the output a list of the fit objects created in the process of estimating the weights at each time point. For example, with method = "glm", a list of the glm objects containing the propensity score models at each time point will be included. See the help pages for each method for information on what object will be included if TRUE.

is.MSM.method

whether the method estimates weights for multiple time points all at once (TRUE) or by estimating weights at each time point and then multiplying them together (FALSE). This is only relevant for method = "optweight"), which estimates weights for longitudinal treatments all at once, and for user-specified functions.

weightit.force

several methods are not valid for estimating weights with longitudinal treatments, and will produce an error message if attempted. Set to TRUE to bypass this error message.

...

other arguments for functions called by weightit() that control aspects of fitting that are not covered by the above arguments. See Details at weightit().

x

a weightitMSM object; the output of a call to weightitMSM().

## Value

A weightitMSM object with the following elements:

weights

The estimated weights, one for each unit.

treat.list

A list of the values of the time-varying treatment variables.

covs.list

A list of the covariates used in the fitting at each time point. Only includes the raw covariates, which may have been altered in the fitting process.

data

The data.frame originally entered to weightitMSM().

estimand

"ATE", currently the only estimand for MSMs with binary or multinomial treatments.

method

The weight estimation method specified.

ps.list

A list of the estimated propensity scores (if any) at each time point.

s.weights

The provided sampling weights.

by

A data.frame containing the by variable when specified.

stabilization

The stabilization factors, if any.

## Details

In general, weightitMSM() works by separating the estimation of weights into separate procedures for each time period based on the formulas provided. For each formula, weightitMSM() simply calls weightit() to that formula, collects the weights for each time period, and multiplies them together to arrive at longitudinal balancing weights.

Each formula should contain all the covariates to be balanced on. For example, the formula corresponding to the second time period should contain all the baseline covariates, the treatment variable at the first time period, and the time-varying covariates that took on values after the first treatment and before the second. Currently, only wide data sets are supported, where each unit is represented by exactly one row that contains the covariate and treatment history encoded in separate variables.

The "cbps" method, which calls CBPS() in CBPS, will yield different results from CBMSM() in CBPS because CBMSM() takes a different approach to generating weights than simply estimating several time-specific models.

## Author

Noah Greifer

weightit() for information on the allowable methods.

## References

Cole, S. R., & Hernán, M. A. (2008). Constructing Inverse Probability Weights for Marginal Structural Models. American Journal of Epidemiology, 168(6), 656–664. doi:10.1093/aje/kwn164

## Examples

library("twang")
library("cobalt")

data("iptwExWide", package = "twang")
(W1 <- weightitMSM(list(tx1 ~ age + gender + use0,
tx2 ~ tx1 + use1 + age + gender + use0,
tx3 ~ tx2 + use2 + tx1 + use1 + age + gender + use0),
data = iptwExWide,
method = "glm"))
#> A weightitMSM object
#>  - method: "glm" (propensity score weighting with GLM)
#>  - number of obs.: 1000
#>  - sampling weights: none
#>  - number of time points: 3 (tx1, tx2, tx3)
#>  - treatment:
#>     + time 1: 2-category
#>     + time 2: 2-category
#>     + time 3: 2-category
#>  - covariates:
#>     + baseline: age, gender, use0
#>     + after time 1: tx1, use1, age, gender, use0
#>     + after time 2: tx2, use2, tx1, use1, age, gender, use0
summary(W1)
#>                  Summary of weights
#>
#>                        Time 1
#>                  Summary of weights
#>
#> - Weight ranges:
#>
#>            Min                                   Max
#> treated 1.6667 |---|                         15.5292
#> control 2.6123 |---------------------------| 82.2349
#>
#> - Units with the 5 most extreme weights by group:
#>
#>              477     442     307     409     906
#>  treated 13.4464 13.6233 13.7059  14.436 15.5292
#>              206     282     641     547     980
#>  control 44.8027 46.1355 58.4164 79.8919 82.2349
#>
#> - Weight statistics:
#>
#>         Coef of Var   MAD Entropy # Zeros
#> treated       0.481 0.376   0.117       0
#> control       0.768 0.494   0.222       0
#>
#> - Effective Sample Sizes:
#>
#>            Control Treated
#> Unweighted  294.     706.
#> Weighted    185.18   573.6
#>
#>                        Time 2
#>                  Summary of weights
#>
#> - Weight ranges:
#>
#>            Min                                   Max
#> treated 1.6667 |---------------------------| 82.2349
#> control 2.6123 |--------------------------|  79.8919
#>
#> - Units with the 5 most extreme weights by group:
#>
#>               34     561     553     641     980
#>  treated  33.371 34.7405 42.9442 58.4164 82.2349
#>              109      95     206     282     547
#>  control 39.7862 42.2256 44.8027 46.1355 79.8919
#>
#> - Weight statistics:
#>
#>         Coef of Var   MAD Entropy # Zeros
#> treated       0.960 0.653   0.336       0
#> control       0.737 0.384   0.159       0
#>
#> - Effective Sample Sizes:
#>
#>            Control Treated
#> Unweighted   492.   508.
#> Weighted     318.9  264.49
#>
#>                        Time 3
#>                  Summary of weights
#>
#> - Weight ranges:
#>
#>            Min                                   Max
#> treated 1.6667 |-------------|               44.8027
#> control 2.6123 |---------------------------| 82.2349
#>
#> - Units with the 5 most extreme weights by group:
#>
#>              936     650     763     109     206
#>  treated  26.003  26.471 28.0582 39.7862 44.8027
#>              553     282     641     547     980
#>  control 42.9442 46.1355 58.4164 79.8919 82.2349
#>
#> - Weight statistics:
#>
#>         Coef of Var   MAD Entropy # Zeros
#> treated       0.773 0.565   0.245       0
#> control       0.873 0.469   0.227       0
#>
#> - Effective Sample Sizes:
#>
#>            Control Treated
#> Unweighted  415.     585.
#> Weighted    235.67   366.4
#>
bal.tab(W1)
#> Call
#>  weightitMSM(formula.list = list(tx1 ~ age + gender + use0, tx2 ~
#>     tx1 + use1 + age + gender + use0, tx3 ~ tx2 + use2 + tx1 +
#>     use1 + age + gender + use0), data = iptwExWide, method = "glm")
#>
#> Balance summary across all time points
#> prop.score 1, 2, 3 Distance       0.0251
#> age        1, 2, 3  Contin.       0.0703
#> gender     1, 2, 3   Binary       0.0263
#> use0       1, 2, 3  Contin.       0.0558
#> tx1           2, 3   Binary       0.0171
#> use1          2, 3  Contin.       0.0316
#> tx2              3   Binary       0.0085
#> use2             3  Contin.       0.0315
#>
#> Effective sample sizes
#>  - Time 1
#>            Control Treated
#>  - Time 2
#>            Control Treated
#>  - Time 3
#>            Control Treated

##Going from long to wide data
data("iptwExLong", package = "twang")
wide_data <- reshape(iptwExLong$covariates, #long data timevar = "time", #time variable v.names = c("use", "tx"), #time-varying idvar = "ID", #time-stable direction = "wide", sep = "") (W2 <- weightitMSM(list(tx1 ~ age + gender + use1, tx2 ~ tx1 + use2 + age + gender + use1, tx3 ~ tx2 + use3 + tx1 + use2 + age + gender + use1), data = wide_data, method = "glm")) #> A weightitMSM object #> - method: "glm" (propensity score weighting with GLM) #> - number of obs.: 1000 #> - sampling weights: none #> - number of time points: 3 (tx1, tx2, tx3) #> - treatment: #> + time 1: 2-category #> + time 2: 2-category #> + time 3: 2-category #> - covariates: #> + baseline: age, gender, use1 #> + after time 1: tx1, use2, age, gender, use1 #> + after time 2: tx2, use3, tx1, use2, age, gender, use1 summary(W2) #> Summary of weights #> #> Time 1 #> Summary of weights #> #> - Weight ranges: #> #> Min Max #> treated 1.6667 |---| 15.5292 #> control 2.6123 |---------------------------| 82.2349 #> #> - Units with the 5 most extreme weights by group: #> #> 477 442 307 409 906 #> treated 13.4464 13.6233 13.7059 14.436 15.5292 #> 206 282 641 547 980 #> control 44.8027 46.1355 58.4164 79.8919 82.2349 #> #> - Weight statistics: #> #> Coef of Var MAD Entropy # Zeros #> treated 0.481 0.376 0.117 0 #> control 0.768 0.494 0.222 0 #> #> - Effective Sample Sizes: #> #> Control Treated #> Unweighted 294. 706. #> Weighted 185.18 573.6 #> #> Time 2 #> Summary of weights #> #> - Weight ranges: #> #> Min Max #> treated 1.6667 |---------------------------| 82.2349 #> control 2.6123 |--------------------------| 79.8919 #> #> - Units with the 5 most extreme weights by group: #> #> 34 561 553 641 980 #> treated 33.371 34.7405 42.9442 58.4164 82.2349 #> 109 95 206 282 547 #> control 39.7862 42.2256 44.8027 46.1355 79.8919 #> #> - Weight statistics: #> #> Coef of Var MAD Entropy # Zeros #> treated 0.960 0.653 0.336 0 #> control 0.737 0.384 0.159 0 #> #> - Effective Sample Sizes: #> #> Control Treated #> Unweighted 492. 508. #> Weighted 318.9 264.49 #> #> Time 3 #> Summary of weights #> #> - Weight ranges: #> #> Min Max #> treated 1.6667 |-------------| 44.8027 #> control 2.6123 |---------------------------| 82.2349 #> #> - Units with the 5 most extreme weights by group: #> #> 936 650 763 109 206 #> treated 26.003 26.471 28.0582 39.7862 44.8027 #> 553 282 641 547 980 #> control 42.9442 46.1355 58.4164 79.8919 82.2349 #> #> - Weight statistics: #> #> Coef of Var MAD Entropy # Zeros #> treated 0.773 0.565 0.245 0 #> control 0.873 0.469 0.227 0 #> #> - Effective Sample Sizes: #> #> Control Treated #> Unweighted 415. 585. #> Weighted 235.67 366.4 #> all.equal(W1$weights, W2\$weights)
#> [1] TRUE

#Using stabilization factors
W3 <- weightitMSM(list(tx1 ~ age + gender + use0,
tx2 ~ tx1 + use1 + age + gender + use0,
tx3 ~ tx2 + use2 + tx1 + use1 + age + gender + use0),
data = iptwExWide,
method = "glm",
stabilize = TRUE,
num.formula = list(~ 1,
~ tx1,
~ tx2 + tx1))

#Same as above but with fully saturated stabilization factors
#(i.e., making the last entry in 'num.formula' tx2*tx2)
W4 <- weightitMSM(list(tx1 ~ age + gender + use0,
tx2 ~ tx1 + use1 + age + gender + use0,
tx3 ~ tx2 + use2 + tx1 + use1 + age + gender + use0),
data = iptwExWide,
method = "glm",
stabilize = TRUE)