In this paper, we conducted simulations to investigate the performance of the marginal structural Cox model when the ETA assumption is practically violated. We first illustrated the impact of ETA violation on IPTW estimation with a detailed investigation of four purposely selected poorly performing samples, which showed that poor estimation can be due to a small number of very influential observations, sometimes even a single observation, especially if it is assigned an extremely high weight. We then assessed how weight truncation can affect both the accuracy and the precision of IPTW estimators, with a particular focus on the selection of an “optimal” truncation level based on an approximated MSE. This approximate MSE calculation was accomplished by either using a proxy for the true parameter value or via a cross-validation approach.

In our main simulations, we considered a continuous confounder, a moderate confounder-treatment association, and up to ten follow-up visits such that there was a high probability of practical violation of the ETA assumption. As anticipated, although the untruncated IPTW estimators were unbiased for effects of both direct and indirect treatments, they exhibited large variability resulting in high mean-squared errors. Furthermore, the SEs of the untruncated IPTW estimates were also seriously underestimated with the SD/SE ratio greater than 1.5, resulting in the coverage rates of the 95% CIs much lower than the nominal 95%.

Cole and Hernán proposed to truncate the stabilized weights at 1% and 99% of their distribution. Our simulations show that truncating the lower tail of the weights distribution has no impact on the IPTW estimates (data not shown), and thus we considered only truncation of the upper tails. As stabilized weights were progressively truncated at lower values, the SDs for both direct and indirect treatment effect estimates decreased while the bias increased.

For the most part, the four proposed MSE-based truncated IPTW estimators as well as the two fixed-percentile truncated IPTW estimators significantly reduced the MSE of both treatment effect estimators but incurred some small bias. Weight truncation at the 99th percentile gave the most accurate estimation of SE with the SD/SE ratio almost equal to 1 and the highest coverage rates. The proposed estimator that selected the optimal truncation level by using the 99th percentile truncated weighted estimator as the proxy to approximate the expected MSE had the best performance as reflected by the lowest MSE. The performance of the two-fold CV approach was slightly worse than the methods based on percentiles; however, this approach has the advantage of being more objective in that it does not require arbitrary specification of a truncation level or a proxy. Repeating the cross-validation procedure several times significantly improved results, with MSEs approaching those of the percentile-based methods. It is interesting to note that although the overall performance of the CV approach is slightly worse than percentile-based proxy approaches, it identified the same “optimal” truncation level as the gold standard method based on the true data-generating parameters with higher frequency than all other approaches. In future research, we need to systematically investigate the impact of choice of the number of folds and the number of repetitions on the performance of the proposed cross-validation approach.

The consistent performance of the different approaches under different sample sizes and different levels of the ETA assumption violation suggests that our results and conclusions are robust with respect to sample size and the level of ETA violation.

The application of the methods to the MACS confirms that the bias-variance trade-off of the weight truncation methods and indicates that weight truncation can be used to reduce the large variance of IPTW estimates. However, we note that the excessive weight truncation may distort the estimation results. Unlike the methods that truncate the weights at a fixed level or the MSE-based methods using an user-specified proxy, the proposed two-fold cross-validation method does not require any artificial specification, and thus could be a more objective and more data-adaptive method.

Our simulations show that the large variability of the untruncated treatment effect estimator was mostly due to a small subset of samples with very unusual estimates. These estimates were, in turn, typically due to a few highly influential observations that had extreme weights resulting from unusual treatment patterns in the interval in which an event occurred. Thus, it is important to assess the influence of individual observations before reporting the IPTW estimate of a marginal structural model. Otherwise, such outliers may result in IPTW estimates that are far away from the true effect, and in extreme cases, can even reverse the direction of the effect.

In addition, our results indicate that although the average of the bootstrap-based SEs is typically very close to that of the robust SE, the bootstrap was better able to reflect instability in untruncated IPTW estimators in “bad” samples in which observations having the most extreme weights also had an event in the interval that the unusual treatment patterns were observed. Thus, the bootstrap is to be recommended to estimate the variance of IPTW estimators, although in many applications it may offer only modest improvement in the estimation of variability for truncated IPTW estimators over robust SE estimators. In addition, we found in our simulations that the SEs of IPTW estimates were much lower than their empirical SDs. Future research should address this issue.

As in all simulation studies, we relied on some simplifying assumptions: while we attempted to mimic general features of a longitudinal study of HIV progression, the assumed causal structure of our data was relatively uncomplicated. We assumed that the hazard in each interval between two visits remained constant and depended only on the most recent values of the treatment and the time-varying covariate, measured at the beginning of the interval. In practice, both the treatment decision and the hazard are likely to also depend on cumulative effects of past treatments, past history of changes in the time-varying covariate, their response to past treatments, and other covariates (Sylvestre and Abrahamowicz, 2009; Vacek, 1997).

Another assumption of our main simulations was that the total effect of treatment on the logarithm of the hazard may be decomposed into two additive components: direct effect of current treatment and the effect of treatment at the previous visit that was entirely mediated through the change in the time-dependent covariate. This critical assumption facilitated the generation of survival times conditional on the current values of the time-varying covariate and treatment, and the assessment of the accuracy of the estimates. A recently developed permutational algorithm for generating event times conditional on arbitrarily complex time-dependent covariates and/or effects (Sylvestre and Abrahamowicz, 2008) may be useful to simulate more complex data structures (Burton et al., 2006).

In addition, in the simulation studies, we assumed that both the treatment model and the marginal structural Cox model were correctly specified in the data analysis. However, in practice, any unsaturated models may be misspecified, and thus may lead to biased causal effect estimates (Robins and Hernán, 2008). Neugebauer and van der Laan (2007) developed nonparametric MSM (NPMSM) approach, which does not require correct specification of a parametric model but instead relies on a working model that can be willingly misspecified. The NPMSM may be appealing in those applications where there is no sufficient information to correctly specify the parametric model (Neugebauer and van der Laan, 2007).

Another limitation of all the methods considered in our simulations, and of other weight truncation or stabilization methods proposed in the literature, is that they will not remove bias in the case of a theoretical violation of the positivity assumption. In that case, no unusual patterns of treatment will be observed and, thus, no extreme weights will occur, so that the truncated estimates will have similar bias to untruncated ones. This limitation may also occur when the positivity assumption is *practically* violated, which is more likely to occur in small samples. When, all study subjects with a particular covariate vector receive the same treatment, no extreme weights will be assigned to this covariate pattern and, thus, weight truncation will not reduce bias in this situation.

Petersen et al. (2012) systematically reviewed alternative approaches to deal with the positivity violation and pointed out that most of these approaches represent some trade-off between unbiasedness and proximity to the initial target of inference. Alternatives to truncation include the removal of covariates that induce the most extreme weights, defining realistic treatment rules based on observed patterns in the data, or redefining the population of interest. Many of these alternative approaches rely on changing the target parameter to one that is more easily identified, which may be the only feasible solution if positivity violations are severe or by design (i.e., theoretical). The parametric bootstrap, which was proposed and validated for a point-treatment study (Wang et al., 2006), has been advocated as a tool to assess the severity of positivity violation (Petersen et al., 2012). van der Laan and Gruber (2010) developed the collaborative targeted maximum likelihood estimator (C-TMLE), in which the treatment mechanism model was data-adaptively selected in order to optimize MSE for the target parameter. The C-TMLE estimator was extended to time-to-event data by Stitelman and van der Laan (2010) and its performance was compared with alternative approaches to estimating causal effects under practical violations of the positivity assumption (Stitelman and van der Laan, 2010).

It would be interesting to investigate, in future research, whether the approaches considered in our study would improve the performance (bias and/or variance) of the IPTW estimators in the situation where estimates using untruncated weights are themselves biased (Freedman and Berk, 2008), and in situations where models are incorrectly specified.

In conclusion, our results confirm that when ETA assumption is violated, IPTW estimators of marginal structural Cox models may suffer from large variability. Simple weight truncation at high percentiles such as the 99th or the 99.5th of the distribution of weights can be applied to improve the IPTW estimators under ETA violation in most scenarios we considered. Our newly proposed data-adaptive approaches to selecting the truncation level that minimizes the expected MSE, using either observed statistics as a proxy of the true parameter or using a CV-like method, also exhibited good performance. This MSE-based method for selecting a truncation level in general can be applied to any type of estimator that relies on truncation. However, the performance of the alternative approaches should be further evaluated in a wide range of settings, including model misspecification and theoretical violations of the ETA.

## Comments (0)

General note:By using the comment function on degruyter.com you agree to our Privacy Statement. A respectful treatment of one another is important to us. Therefore we would like to draw your attention to our House Rules.