## Abstract

We performed an empirical study to evaluate the effect of mismeasured continuous confounders on the estimation of the causal parameter when using marginal structural models and inverse probability-of-treatment weighting. By executing an extensive simulation using 500 randomly generated parameter value combinations within a defined space, we observed the well-understood effects of attenuation and augmentation, and two unanticipated effects: null effects and sign reversals. We implemented a secondary empirical study to further investigate the sign reversal effect. We use the results of our study to identify conceptual similarities between the analytic and empirical results for multivariable linear and logistic regression, and our empirical results. Through this synthesis, we have been able to suggest feasible directions of research as well as outline the form of expected results.

## 1 Introduction

Marginal structural models (MSMs) estimated via inverse probability-of-treatment weighting (IPTW) have become a standard tool for epidemiologists and researchers studying longitudinal processes [1, 2]. In the context of causal modeling, there has been interest and debate regarding the interpretation of effects of “exposures” that are not directly measurable, such as adiposity as measured via body-mass index [3–5]. The more general problem of measurement error in the context of directly but poorly measured quantities remains an area of interest.

Cole et al. [6] consider the problem of consistently estimating the net effect of HAART treatment on the first diagnosis of clinical AIDS or death from any cause when treatment is imperfectly measured. They integrate regression calibration, using validated treatment data, with inverse probability-of-treatment and censoring that is combined with a pooled logistic regression approach for marginal structural Cox models. The empirical results from Cole et al. [6] demonstrate attenuation of the estimated effect when treatment is mismeasured and the ability of regression calibration to produce an unbiased estimate.

Babanezhad et al. [7] investigate the problem of treatment measurement error, or more generally treatment mismeasurement. They study the impact of mismeasurement across several models which aim to quantify the same estimand. Focusing on the misclassification of a dichotomous treatment, they consider ordinary least squares, g-estimation, the naive IPTW, and doubly robust IPTW estimation. The asymptotic bias is discussed within the context of a linear model where it is shown, analytically and empirically, that the four estimators have the same bias when the probability of misclassification does not depend on covariates.

Recently, Ogburn and VanderWeele [8] explore the problem of nondifferential misclassification of a binary confounder. Following on work by Greenland [9], they show that if the effect of the confounder on the outcome is in the same direction for both the treated and the untreated, formalized as monotonicity [8, Result 1], then adjustment reduces the bias due to the confounder; the adjusted effect lies between the crude and the true measures. If the monotonicity assumption is violated, then the adjusted value may not fall between the crude and the true measures.

In this article, we study the problem of confounder mismeasurement and empirically investigate its impact on the estimation of the causal effect of treatment using MSMs with IPTW, focusing on point-treatment study. In Section 2, we review MSMs, the stabilized inverse probability-of-treatment weight, and the measurement error model. Section 3 describes the simulation study with the results presented in Section 4.

## 2 The models

### 2.1 Counterfactuals and the marginal structural model

Counterfactuals, or potential outcomes, are a useful tool for understanding MSMs and often, they are presented in the context of time-varying covariates and treatments [10–12]. With our focus on MSMs for point-treatment studies, we have a single interval setting where data are collected at the initiation of the study , and at a successive point in time, . We let *L* denote the vector of baseline covariates that were measured at before follow-up or final assessment at . We define the treatment decisions *A* to be the treatment assignment occurring between times and ; throughout, we shall take . Finally, we define *Y* as the continuous outcome measured at .

We want a model describing the mean of *Y* for each treatment history, *a*, so we let denote the potential response that would be observed if the subject followed treatment *a*. We begin with the model

where parameterizes the model and the function *g* is a link function. In general, the past treatments can be incorporated into the systematic component using some function of the treatment history, , thus . The observed data for the *i*th subject are . We note that the point-treatment study is generalizable to *k* intervals [13].

Under the assumptions of consistency, exchangeability [14], positivity [15], and time ordering where treatment precedes the outcome [16], we can obtain an unbiased estimate of . We accomplish this by fitting a model to the observed data where, under the conditions of consistency and exchangeability, the marginal expectation of the counterfactual is equal to the conditional expectation of the response given the treatment, . Additionally, we assume that the *n* subjects are a random sample from some superpopulation of subjects, which is nearly infinite in size, about whom we wish to make inference [14].

### 2.2 The inverse probability-of-treatment weight

The assumption of no unmeasured confounders is an essential condition that permits causal inference from observational data [12]. Under this assumption, and in the absence of model misspecification, statistical exogeneity implies causal exogeneity [14]. A measure of statistical exogeneity for each subject can be used to weight each observation so that each subject represents that which would have been observed in the superpopulation with an ancillary treatment process [12].

The weight, called the stabilized weight, can be interpreted as the inverse of the incremental effect of the time-varying confounders on the current treatment beyond other treatment determinants including previous history and baseline covariates [1, 2]. When subject information is not observable, such as loss to follow-up, a censoring process runs concurrent with both the treatment and the covariate processes. Following Hernán et al. [17], we define the censoring process at time , as *C*. It occurs between times and and precedes treatment allocation; with defining data observation. The weight for point-treatment studies is specified as , where

Robins [12] shows that the weight corrects the regression estimator for the effect of confounding and censoring due to *L*. It is naturally expressed by considering the true joint density of the factual and counterfactual data, and it encodes information ensuring that the treatment history is causally exogenous, thus permitting the equivalence .

### 2.3 Classical unbiased nondifferential measurement error

We apply the classical, unbiased, additive, nondifferential measurement error model to confounders. For the *i*th subject and *k*th confounder, we have where is the mismeasured version of , , and the errors are independent, for , and ; . Under the assumption that or equivalently and *Y* are independent given *L*, the model is nondifferential [18, 19]. Furthermore, we assume that if and are mismeasured, then the associated errors are independent, .

For this situation, the relevant point-treatment directed acyclic graph (DAG) for classical measurement error is shown in Figure 1, where denotes exogenous measurement error noise. Here, *L* d-separates from , thus by the probabilistic implications of d-separation [20, Theorem 1.2.4], is independent of conditional on *L* in every distribution compatible with the specified DAG (Figure 1). Furthermore, for all compatible distributions, this conditional independence implies surrogacy and nondifferential measurement error [18, 21].

When , rather than *L*, is observed, then the inverse probability-of-treatment weight with a censoring process is

Using rather than *L*, the causal estimator may be biased. We are solving and not , where is a function of the treatment history and response parameterized by .

## 3 Simulation study

Attenuation and augmentation of parameter estimates are well-documented effects of measurement error. Attenuation occurs when the estimator is smaller in magnitude than the true value of the parameter. For simple linear regression, this is expressed as the reliability ratio [19, section 3.2.1] or equivalently by the attenuation factor [21, section 2.1]. Gustafson [21] shows that additional correlated covariates worsen this effect.

Augmentation, also known as *reversal*, occurs when the estimator is larger in magnitude than the true value of the parameter. In the setting of multiple linear regression, both attenuation and augmentation can occur, under additive measurement error, if there are multiple mismeasured covariates [21]. Recognizing these well-known effects of measurement error in linear regression, we are motivated to investigate whether the same set of effects will be observed for MSMs using inverse probability weights.

### 3.1 Primary study

For our investigation, we focus on the point-treatment case. Figure 2(a) and 2(b) are the DAGs corresponding to the data generating mechanism with and without censoring, respectively. The dashed line in Figure 1 indicates that the censored data indicator was not included as a covariate in the data generating model, but affects what is observed.

Table 1 gives the six scenarios we considered. Scenarios S1–S3 had no censoring, using the data generating mechanism in Figure 2(a). Scenarios S4–S6 had censored data (Figure 2b). Scenarios S2 and S5 had only one mismeasured confounder, whereas scenarios S3 and S6 had both confounders mismeasured (Table 1).

Scenario | Censoring | mismeasured | mismeasured |

S1 | No | No | No |

S2 | No | Yes | No |

S3 | No | Yes | Yes |

S4 | Yes | No | No |

S5 | Yes | Yes | No |

S6 | Yes | Yes | Yes |

The vector of confounders, , , is assumed bivariate Normal, , where and

where and , the correlation between and . We chose uncorrelated confounders to eliminate, or at least minimize, any induced augmentation [21, section 2.4]. We let *A* be binary, with denoting the event of interest (e.g. new treatment) and indicating missing data.

Table 2 identifies the observed data vectors for the six scenarios. Logistic regression was used to estimate the probability models given in Table 2. For example, we used for scenario S1 and for scenario S4, we had the additional model for the censoring mechanism. These two models have the basic structure except for the specification of mismeasured covariates (Table 2). Inverse probability-of-treatment weights were estimated by incorporating the probability models specified in Table 2 into eqs (1) and (2) for a point-treatment process, for example, scenario S4 used the stabilized weight of eq. (1). These weights were used to estimate the causal effect of *A* on

Scenario | Observed data | Probability model |

S1 | ||

S2 | ||

S3 | ||

S4 | ||

S5 | ||

S6 |

The response is continuous and specified as , and . Given the distribution of the confounders, the marginal model parameters are

where , thus and . We let *L* be mismeasured such that , where , and , where , such that and .

We implemented an extensive simulation study, using 500 randomly generated parameter value combinations within a defined space, to investigate the effect of measurement error on the causal parameter. Parameters were selected from the space defined by . We set and randomly selected the remaining parameters from the uniform distribution, , rounded to three decimal places. By design, the product space associated with the seven free parameters permitted independent sampling within each dimension. For each of the 500 locations in the parameter space, a simulation study was implemented using 5,000 simulated data sets with each data set having a sample size of 10,000.

We estimate the standard error of the bias using the empirical sampling distribution of biases, and the bootstrap. The estimated bias is where and . The 95% simulation based confidence intervals for and , across the six scenarios and four cases, were derived using the respective empirical distributions. The simulation study was conducted using R [22] with 4-GB RAM at 1,067 MHz DDR3 and a 2.8-GHz Intel Core 2 Duo processor.

### 3.2 Secondary study

We conducted additional simulations to investigate further the results of the primary study. We wanted to see if the findings of Gustafson [21] applied to the estimation of a MSM causal parameter. Specifically, we investigated the effects of confounder measurement error variability and confounder correlation. We chose the sign reversal scenario, as discussed in Section 4, where = (0, 1.037, 0.893, 0,1.86, –0.683, 0, 2.924, 2.661, –0.455). This parameterization indicates that and . We varied the parameters , , and , permitting and , the standard deviations associated with the measurement error models, to take values in the set and specified correlations between and , .

## 4 Results

Tables 3 and 4 describe four representatives from the 500 locations in the parameter space, with mean values taken over the 5,000 simulations, that illustrate the four types of effects for four specific sets of . When the covariates are accurately and precisely measured, the estimated regression coefficients are unbiasedly estimated (Tables 3 and 4, and scenarios S1 and S4, left-most panels of Figures 3–6). In these figures, the solid line (—) is the regression line . The dashed line is the regression line , where and denote the intercept and slope obtained using the weights . The dark gray lines denote simulation regressions that were attenuated, , and the light gray lines denote augmented regressions, , 5,000.

Scenario | Attenuation | Augmentation | Null effect | Sign reversal |

0 | 0 | 0 | 0 | |

S1 | 0.045(0.007) | 0.000(0.001) | 0.000(0.001) | 0.001(0.002) |

S2 | 0.390(0.004) | 0.048(0.001) | 0.006(0.001) | −0.443(0.001) |

S3 | 0.879(0.001) | 0.037(0.001) | 0.004(0.001) | −0.289(0.001) |

S4 | 0.099(0.009) | 0.000(0.001) | 0.000(0.001) | 0.000(0.002) |

S5 | 0.349(0.006) | 0.050(0.001) | −0.004(0.001) | −0.448(0.001) |

S6 | 0.908(0.002) | 0.040(0.001) | −0.003(0.001) | −0.329(0.001) |

2.56 | −0.33 | −2.31 | −0.45 | |

S1 | −0.134(0.010) | 0.000(0.001) | 0.000(0.001) | 0.001(0.003) |

S2 | −0.896(0.007) | −0.118(0.001) | −0.015(0.001) | 1.007(0.001) |

S3 | −1.980(0.003) | −0.091(0.001) | −0.011(0.001) | 0.678(0.001) |

S4 | −0.197(0.012) | 0.000(0.001) | 0.000(0.001) | 0.002(0.003) |

S5 | −0.955(0.008) | −0.117(0.001) | −0.014(0.001) | 0.992(0.001) |

S6 | −1.959(0.003) | −0.091(0.001) | −0.009(0.001) | 0.663(0.001) |

Attenuation | Augmentation | Null effect | Sign reversal | |

Scenario/ | 0 | 0 | 0 | 0 |

S1 | 0.05(–1.04,0.52) | 0.00(–0.03,0.04) | 0.00(–0.03,0.03) | 0.00(–0.17,0.25) |

S2 | 0.40(–0.29,0.72) | 0.05(0.01,0.08) | 0.00(–0.02,0.03) | −0.43(–0.56,–0.30) |

S3 | 0.88(0.64,1.04) | 0.04(0.00,0.07) | 0.01(–0.02,0.03) | −0.29(–0.42,–0.15) |

S4 | 0.10(–1.25,0.66) | 0.00(–0.03,0.04) | 0.00(–0.03,0.03) | 0.00(–0.20,0.31) |

S5 | 0.35(–0.62,0.75) | 0.05(0.02,0.09) | 0.00(–0.03,0.03) | −0.45(–0.58,–0.29) |

S6 | 0.91(0.55,1.12) | 0.04(0.01,0.08) | 0.00(–0.03,0.03) | −0.33(–0.46,–0.17) |

Scenario/ | 2.56 | −0.33 | −2.31 | −0.45 |

S1 | 2.43(1.62,4.27) | −0.33(–0.38,–0.29) | −2.31(–2.35,–2.27) | −0.44(–0.84,–0.17) |

S2 | 1.66(1.12,2.77) | −0.45(–0.50,–0.40) | −2.32(–2.36,–2.28) | 0.56(0.37,0.73) |

S3 | 0.58(0.33,0.98) | −0.43(–0.47,–0.38) | −2.32(–2.36,–2.28) | 0.23(0.03,0.40) |

S4 | 2.36(1.43,4.42) | −0.33(–0.38,–0.29) | −2.31(–2.37,–2.25) | −0.44(–0.88,–0.16) |

S5 | 1.61(0.96,2.95) | −0.45(–0.50,–0.40) | −2.32(–2.37,–2.27) | 0.55(0.34,0.72) |

S6 | 0.60(0.30,1.09) | −0.43(–0.47,–0.38) | −2.32(–2.37,–2.27) | 0.22(0.00,0.39) |

Both attenuation and augmentation of the treatment effect were observed. Two unexpected effects were also noted. We call the first a *null effect*; the use of or the set had no clear effect on the expected bias. The second, which we are calling *sign reversal* to distinguish it from Gustafson’s definition of *reversal*, results in a different sign for the parameter of interest, , where denotes the estimated coefficient using the mismeasured confounder(s).

### 4.1 Attenuation and augmentation

Attenuation and augmentation are known effects of measurement error [18, 21]. As expected with attenuation, there is an increase in the magnitude of the downward bias as confounder measurement error is introduced (scenarios S2 and S3 and S5 and S6 of Figure 3). As suggested by Gustafson [21], we observed greater attenuation as the number of mismeasured confounders increased (Table 3). Although bias is noted for scenarios S2 and S5, Table 4 suggests that the induced bias may not be meaningful with respect to the variability of the distribution of . The opposite is true for scenarios S3 and S6 (Table 4). We found that the standard error of the bias of was relatively small, suggesting a meaningful bias as well as the potential for extreme values to unduly influence the construction of simulation based confidence intervals when confounder measurement error occurs (Figure 3).

As with attenuation, the effect of augmentation was anticipated. The augmentation in Figure 4, scenarios S2 and S5, is slightly less pronounced than that of attenuation with much less variability of the empirical distribution of (Table 4). The augmentation effect is similar with and without censoring (Table 4). In scenarios S3 and S6 (Figure 4 and Table 4), the addition of an extra mismeasured confounder had little effect on the estimate of and on the percentile interval. Table 4 shows that in all four scenarios, the intervals fail to capture the true value of . Tables 3 and 4 reveal that attenuation and augmentation of the causal parameter, , do not translate to the same measurement error effect on intercept, .

### 4.2 Null effect and sign reversal

Two unanticipated effects were observed: a null effect and sign reversal. A null effect occurs when the use of mismeasured confounders has little or no impact on the estimator (Figure 5). There are two distinguishing features: unbiased parameter estimation and small variation in the empirical distribution of . The case shown in Figure 5 has little variation in the estimates across the 5,000 simulations (Table 3), which, from our set of simulations, does not appear to be atypical of the underlying distribution of for this effect. We observe no added attenuation or augmentation with more than one confounder measured with error.

The second is sign reversal (Figure 6). Although sign reversal (trend reversal) is discussed in the context of misclassification [3, 23, 24], it is not well documented for classical measurement error models and perhaps never observed due to the strong analytic and empirical results that support attenuation and augmentation [18, 19, 21, 25]. In scenarios S2 and S5 of Figure 6, we observe that is very close , but the sign is reversed, being positive rather than negative (Table 4). Scenarios S3 and S6 exhibit an increased effect of attenuation with the use of a second mismeasured confounder, suggesting that the anticipated effects of attenuation and augmentation are functioning in an intuitive manner despite the reversed sign. There is a tighter distribution of the empirical distribution of for scenarios S2, S3, S5, and S6, when compared to scenarios S1 and S3 (Table 3).

We used the 500 studies to estimate the occurrence of the four measurement error effects and defined the effects to be mutually exclusive. Null events are locations where the confidence interval covered the truth. The sign reversal events are defined as all those locations where the confidence interval did not capture the truth and there was a sign reversal. The augmentation and attenuation events are mutually exclusive by definition, but here they are defined conditional on being neither a null nor a sign reversal event.

We observe that augmentation events are the most frequent across all the scenarios and are followed closely by attenuation events (Table 5). When going from a model with only one mismeasured confounder to a model with two mismeasured confounders, there is a reduction in the proportion of null events and an increase in the other three event types. Attenuation events have the largest increase.

Scenario | Null | Sign reversal | Augmentation | Attenuation |

S2 | 16.0 | 7.8 | 42.8 | 33.4 |

S3 | 7.8 | 10.2 | 44.4 | 37.6 |

S5 | 21.2 | 7.2 | 40.4 | 31.2 |

S6 | 9.8 | 9.8 | 43.2 | 37.2 |

### 4.3 Secondary study

Using the sign reversal example, scenario S6, we investigate the observed effects of confounder measurement error on the bias of as a function of , , and , (Figure 7). To orient ourselves within the contour plots, scenario S6 corresponds to Figure 7 where (top left), and ; scenario S6 specified a location between the 0.6 and 0.8 contours. This location corresponds to region where we should expect to see sign reversal combined with attenuation. Furthermore, we make the general observation that as increases, the distance widens between the contour lines.

We are able to demarcate four effect areas with three boundary lines. The first boundary occurs along the contour line where the bias is 0. To the left of this line is the area for which combinations of and , given the specified regression coefficients, will result in augmentation. To the right is the area that yields attenuation. This area, in our example, is bounded on the right by the contour line where the bias is 0.45. To the right of this boundary line is the area where we would see attenuated sign reversal, that is the magnitude of the estimated regression parameter is less than that of the true parameter and the sign is opposite, and . This region is bounded on the right by the contour defined at 0.9. Here, we transition from attenuated sign reversal to augmented sign reversal, and , with equality holding on the boundary.

In our secondary study, changes in appear to be more influential than changes in , across all levels of , for determining the type of effect seen. This may be related to the chosen parameterization of the censoring model. For the sign reversal example, we had , thus changes in the imprecision of measurement for the more important variable have more of an impact on the type of effect observed. This is analogous to the effect of the measurement error for multiple mismeasured predictors using linear regression when the more important variable is measured with more error than others in the model [21, section 2.4].

The secondary study afforded the opportunity to explore the stabilized weight profile of each of the four types of measurement error effects within the sign reversal scenario (Figure 6). With this approach, we are able to fix and investigate how the stabilized weights change across the four types of measurement error. We obtained the 50th, 90th, 95th, and 99th percentile distribution of the weights using the 5,000 simulations; these were summarized using the median. The results for the 90th and 95th percentiles do not deviate much from those of the 50th percentile, so we only focus on the 50th and 99th.

The profile of the 50th percentile of the stabilized weights, , shows an ordering for the weight profile as a function of (Figure 8). All four effects exhibit a positive trend as increases. We observe an ordering of the minimum median stabilized weight: null, augmentation, attenuation, and sign reversal. This ordering is maintained for the maximum median stabilized weight. Despite these trends, the weights are similar in magnitude across all four effects.

The differences between the four measurement error effects become more pronounced when we consider the 99th percentile, the tail of the stabilized weight distributions (Figure 9). All four effects show that the stabilized weights trends are decreasing as increases, which is opposite to the trend observed with the 50th percentile (Figure 8). We observe the reverse ordering of sign reversal to null effect with sign reversal having the smallest median at each correlation. Furthermore, we observe a larger difference in the magnitude of the median of the 99th percentile at each level of . The unexpected result was that the sign reversal had the smallest median weights (Figure 9), which suggests that the profile of the tail of the weight distribution may be important in determining the observed measurement error effect. These empirical results suggest that the effect of confounder measurement error is a function of the model parameters, covariance structure of the confounders, and the noise of the measurement error models, which conceptually agrees with Gustafson [21, Result 2.3, eq. (2.8)] and with Buonaccorsi [18, eq. (6.38)].

## 5 Conclusions

We have observed counter-intuitive effects of confounder measurement error on the estimation of the causal parameter. The measurement error effects of attenuation and augmentation (*reversal*) are now joined by null effects and sign reversals. From our empirical study, we are able to identify some conceptual similarities between our results and those for multivariable linear and logistic regression. The secondary study suggests that effects are a function of the model parameters , thus we anticipate that future work may yield a relationship between the true and mismeasured parameters that is analogous to Gustafson [21, eq. (2.8) or (2.14)].

Given that IPTW uses functions of the propensity score to compute weights, it is natural to consider how models other than logistic regression would perform in this situation. Statistical learners (e.g. logistic regression, neural networks, regression trees, classification trees, and support vector machines) have been explored with respect to their ability to reduce bias, minimize standard errors, and balance covariates when model misspecification is of concern [26–29]. Empirical investigations suggest that boosting methods in conjunction with classification and regression trees (CART) [26] and random forests [26, 27] are promising approaches for balancing the bias-variance trade-off and handling covariate balance. More generally, boosting and bagging approaches are understood to provide better empirical results [26, 27, 29].

It is reasonable to translate these findings to IPTW models, such as MSMs, due to the similarities between IPTW and propensity scores, but the empirical propensity score investigations consider model misspecification of the systematic component and not measurement error. McCaffrey et al. [27] comment on this and clearly state that covariates may be mismeasured. It is suggested that propensity score estimation would not be affected if treatment allocation was a function of the observed, mismeasured covariate. On the other hand, if treatment allocation is dependent on the precisely measured, but unobserved covariates, then biases would result; this observation is relevant to our investigation. McCaffrey et al. [27] suggest that if there is good covariate balance for the mismeasured covariates, then propensity score methods may reduce the confounding effects in the error-free measures. McCaffrey et al. [30] propose an inverse probability weighting correction for error-prone covariates for propensity scores that builds on the work of Pearl [31], but does not consider the MSM context. As a counter point to these approaches, it has been shown that ideal performance of effect estimators is observed when treatment weights (propensity scores) exclude covariates that serve only to predict the treatment (exposure) but not the outcome [32–34], thus further investigations are warranted as such data relationships are typically not known. The propensity score literature provides direction for the use of statistical learners when the data are error-free, but there are unanswered questions about the ability of statistical learners to obtain unbiased estimates or bias-reduced estimates when confounders are mismeasured.

The simulation study is limited in that it estimates the prevalence of four effects, but does not analytically characterize the conditions leading to them as done by Gustafson [21], Carroll et al. [19], and Buonaccorsi [18]. Without analytic results, we are unable to identify the minimal set of influential parameters. Given the highly non-linear structure of this problem, a closed analytic solution may be elusive, as noted by Gustafson [21] in the context of logistic regression under additive measurement error. He provides sobering evidence, in the context of logistic regression, that we may be able to characterize the relationship between the parameters and the actual conditional distribution of the regression [21, section 2.6], but we should not expect to have all the nice features of previous measurement error results.

Our secondary study provided insights into the parameter settings that might lead to sign reversal effects, the most troubling of the effects of measurement error. Further work is necessary, however our observations suggest that translating prior understanding of how measurement error affects estimators to the effect of confounder measurement error is a plausible course of research for developing correction methodology. Another important direction will include the extension of this work to the longitudinal setting, where it is plausible that the impact of measurement error in time-varying covariates will compound the bias observed in the cross-sectional setting. With this perspective, future work will require a combination of analytic characterization and numerical exploration in order to understand and develop the intuition necessary for both analytic and applied work with mismeasured confounders.

## References

1. RobinsJM, HernánMÁ, BrumbackB. Marginal structural models and causal inference in epidemiology. Epidemiology2000;11:550–60.10.1097/00001648-200009000-00011Search in Google Scholar PubMed

2. HernánMÁ, BrumbackB, RobinsJM. Marginal structural models to estimate the causal effect of zidovudine on the survival of HIV-positive men. Epidemiology2000;11:561–70.10.1097/00001648-200009000-00012Search in Google Scholar PubMed

3. HernánMÁ, ColeSR. Invited commentary: causal diagrams and measurement bias. Am J Epidemiol2009;170:959–62.10.1093/aje/kwp293Search in Google Scholar PubMed PubMed Central

4. ShaharE. The association of body mass index with health outcomes: causal, inconsistent or confounded?Am J Epidemiol2009;170:957–8.10.1093/aje/kwp292Search in Google Scholar PubMed

5. ShaharE. Shahar responds to causal diagrams and measurement bias. Am J Epidemiol2009;170:963–4.10.1093/aje/kwp289Search in Google Scholar

6. ColeSR, JacobsonLP, TienPC, KingsleyL, ChmielJS, AnastosK. Using marginal structural measurement-error models to estimate the long-term effect of antiretroviral therapy on incident AIDS or death. Am J Epidemiol2009;171:113–22.Search in Google Scholar

7. BabanezhadM, VansteelandtS, GoetghebeurE. Comparison of causal effect estimators under exposure misclassification. J Stat Plann Inference2010;140:1306–19.10.1016/j.jspi.2009.11.015Search in Google Scholar

8. OgburnEL, VanderWeeleTJ. On the nondifferential misclassification of a binary confounder. Epidemiology2012;23:433–439.10.1097/EDE.0b013e31824d1f63Search in Google Scholar PubMed PubMed Central

9. GreenlandS. The effect of misclassification in the presence of covariates. Am J Epidemiol1980;112:564–9.10.1093/oxfordjournals.aje.a113025Search in Google Scholar PubMed

10. BryanJ, YuZ, van der LaanMJ. Analysis of longitudinal marginal structural models. Biostatistics2004;5:361–80.10.1093/biostatistics/kxg041Search in Google Scholar

11. MoodieEEM, StephensDA. Marginal structural models: unbiased estimation for longitudinal studies. Int J Public Health2011;56:117–19.10.1007/s00038-010-0198-4Search in Google Scholar PubMed

12. RobinsJM. Marginal structural models versus structural nested models as tools for causal inference. In: HalloranME, BerryD, editors. Statistical models in epidemiology: the environment and clinical trials, the IMA volumes in mathematics and its applications, Vol. 116. New York: Springer-Verlag, 1999:95–134.Search in Google Scholar

13. RobinsJM. Marginal structural models. In 1997 Proceedings of the American Statistical Association, Section on Bayesian Statistical Science, 1998.Search in Google Scholar

14. RobinsJM. Association, causation, and marginal structural models. Synthese1999;121:151–79.Search in Google Scholar

15. HernánMÁ. Estimating causal effects from epidemiological data. J Epidemiol Community Health2006;60:578–86.10.1136/jech.2004.029496Search in Google Scholar PubMed PubMed Central

16. MortimerKM. An application of model-fitting procedures for marginal structural models. Am J Epidemiol2005;162:382–8.10.1093/aje/kwi208Search in Google Scholar PubMed

17. HernánMA, BrumbackB, RobinsJM. Marginal structural models to estimate the joint causal effect of nonrandomized treatments. J Am Stat Assoc2001;96:440–8.10.1198/016214501753168154Search in Google Scholar

18. BuonaccorsiJP. Measurement error: models, methods, and applications. Boca Raton, FL: Chapman & Hall and CRC, 2010.10.1201/9781420066586Search in Google Scholar

19. CarrollRJ, RuppertD, StefanskiLA, CrainiceanuCM. Measurement error in nonlinear models: a modern perspective, 2nd ed. Boca Raton, FL: Chapman & Hall and CRC, 2006.10.1201/9781420010138Search in Google Scholar

20. PearlJ. Causality: models, reasoning, and inference, 2nd ed. New York, NY: Cambridge University Press, 2009.10.1017/CBO9780511803161Search in Google Scholar

21. GustafsonP. Measurement error and misclassification in statistics and epidemiology. Boca Raton, FL: Chapman & Hall and CRC, 2004.Search in Google Scholar

22. R Development Core Team. R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria, 2011. Available at: http://www.R-project.org, ISBN 3-900051-07-0.Search in Google Scholar

23. DosemeciM, WacholderS, LubinJH. Does nondifferential misclassification of exposure always bias a true effect toward the null value?Am J Epidemiol1990;132:746–8.10.1093/oxfordjournals.aje.a115716Search in Google Scholar PubMed

24. WeinbergCR, UmbachDM, GreenlandS. When will nondifferential misclassification of an exposure preserve the direction of a trend?Am J Epidemiol1994;140:565–71.10.1093/oxfordjournals.aje.a117283Search in Google Scholar PubMed

25. FullerWA. Measurement error models. Hoboken, NJ: John Wiley & Sons, 1987.Search in Google Scholar

26. LeeBK, LesslerJ, StuartEA. Improving propensity score weighting using machine learning. Stat Med2009;29:337–46.10.1002/sim.3782Search in Google Scholar PubMed PubMed Central

27. McCaffreyDF, RidgewayG, MorralAR. Propensity score estimation with boosted regression for evaluating causal effects in observational studies. Psychol Methods2004;9:403–25.10.1037/1082-989X.9.4.403Search in Google Scholar PubMed

28. SetoguchiS, SchneeweissS, BrookhartMA, GlynnRJ, CookEF. Evaluating uses of data mining techniques in propensity score estimation: a simulation study. Pharmacoepidemiol Drug Saf2008;17:546–55.10.1002/pds.1555Search in Google Scholar PubMed PubMed Central

29. WestreichD, LesslerJ, FunkMJ. Propensity score estimation: neural networks, support vector machines, decision trees (CART), and meta-classifiers as alternatives to logistic regression. J Clin Epidemiol2010;63:826–33.10.1016/j.jclinepi.2009.11.020Search in Google Scholar PubMed PubMed Central

30. McCaffreyDF, LockwoodJR, SetodjiCM. Inverse probability weighting with error-prone covariates. Biometrika2013;100:671–80.10.1093/biomet/ast022Search in Google Scholar PubMed PubMed Central

31. PearlJ. On measurement error in causal inference. In Proceedings of the Twenty-Sixth Conference on Uncertainty in Artificial Intelligence (UAI2010), 2012.Search in Google Scholar

32. AustinPC. The performance of different propensity score methods for estimating marginal hazard ratios. Stat Med2013;32:2837–49.10.1002/sim.5705Search in Google Scholar PubMed PubMed Central

33. LefebvreG, DelaneyJA, PlattRW. Impact of mis-specification on the treatment model on estimates from a marginal structural model. Stat Med2008;27:3629–42.10.1002/sim.3200Search in Google Scholar PubMed

34. MoodieEEM. Risk factor adjustment in marginal structural model estimation of optimal treatment regimes. Biom J2009;51:774–88.10.1002/bimj.200800182Search in Google Scholar PubMed

**Published Online:**2014-1-18

**Published in Print:**2014-5-1

© 2014 by Walter de Gruyter Berlin / Boston