Show Summary Details
More options …

# The International Journal of Biostatistics

Ed. by Chambaz, Antoine / Hubbard, Alan E. / van der Laan, Mark J.

IMPACT FACTOR 2018: 1.309

CiteScore 2018: 1.11

SCImago Journal Rank (SJR) 2018: 1.325
Source Normalized Impact per Paper (SNIP) 2018: 0.715

Mathematical Citation Quotient (MCQ) 2018: 0.03

Online
ISSN
1557-4679
See all formats and pricing
More options …
Volume 12, Issue 2

# Effect Estimation in Point-Exposure Studies with Binary Outcomes and High-Dimensional Covariate Data – A Comparison of Targeted Maximum Likelihood Estimation and Inverse Probability of Treatment Weighting

Menglan Pang
• Centre For Clinical Epidemiology, Lady Davis Research Institute, Jewish General Hospital, Montreal, Quebec, Canada
• Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montreal, Quebec, Canada
• Other articles by this author:
/ Tibor Schuster
/ Kristian B. Filion
• Centre For Clinical Epidemiology, Lady Davis Research Institute, Jewish General Hospital, Montreal, Quebec, Canada
• Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montreal, Quebec, Canada
• Division of Clinical Epidemiology, Department of Medicine, McGill University, Montreal, Quebec, Canada
• Other articles by this author:
/ Mireille E. Schnitzer
/ Maria Eberg
• Centre For Clinical Epidemiology, Lady Davis Research Institute, Jewish General Hospital, Montreal, Quebec, Canada
• Other articles by this author:
/ Robert W. Platt
• Corresponding author
• Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montreal, Quebec, Canada
• Department of Pediatrics, McGill University, Montreal, Quebec, Canada
• The Research Institute of the McGill University Health Centre, Montreal, Quebec, Canada
• orcid.org/0000-0002-5981-8443
• Email
• Other articles by this author:
Published Online: 2016-12-04 | DOI: https://doi.org/10.1515/ijb-2015-0034

## Abstract

Inverse probability of treatment weighting (IPW) and targeted maximum likelihood estimation (TMLE) are relatively new methods proposed for estimating marginal causal effects. TMLE is doubly robust, yielding consistent estimators even under misspecification of either the treatment or the outcome model. While IPW methods are known to be sensitive to near violations of the practical positivity assumption (e. g., in the case of data sparsity), the consequences of this violation in the TMLE framework for binary outcomes have been less widely investigated. As near practical positivity violations are particularly likely in high-dimensional covariate settings, a better understanding of the performance of TMLE is of particular interest for pharmcoepidemiological studies using large databases. Using plasmode and Monte-Carlo simulation studies, we evaluated the performance of TMLE compared to that of IPW estimators based on a point-exposure cohort study of the marginal causal effect of post-myocardial infarction statin use on the 1-year risk of all-cause mortality from the Clinical Practice Research Datalink. A variety of treatment model specifications were considered, inducing different degrees of near practical non-positivity. Our simulation study showed that the performance of the TMLE and IPW estimators were comparable when the dimension of the fitted treatment model was small to moderate; however, they differed when a large number of covariates was considered. When a rich outcome model was included in the TMLE, estimators were unbiased. In some cases, we found irregular bias and large standard errors with both methods even with a correctly specified high-dimensional treatment model. The IPW estimator showed a slightly better root MSE with high-dimensional treatment model specifications in our simulation setting. In conclusion, for estimation of the marginal expectation of the outcome under a fixed treatment, TMLE and IPW estimators employing the same treatment model specification may perform differently due to differential sensitivity to practical positivity violations; however, TMLE, being doubly robust, shows improved performance with richer specifications of the outcome model. Although TMLE is appealing for its double robustness property, such violations in a high-dimensional covariate setting are problematic for both methods.

## 1 Introduction

Population-based treatment effects, i. e. the marginal contrast of outcomes from the same population under different counterfactual treatments, are of particular interest in epidemiological studies being conducted for health policy evaluation. Several methods have been developed for estimation of such marginal effects, in particular marginal structural models (MSM) using inverse probability weighting (IPW) and computationally intensive approaches such as G-computation [1, 2, 3]. Recently, a class of doubly robust estimators has been developed that beneficially employs efficient influence functions for the parameter of interest [4]. An interesting characteristic of both IPW and the doubly robust methods is their use of the inverse of the propensity score [4]. One such procedure for estimating marginal causal effects, named targeted maximum likelihood estimation (TMLE), was proposed by van der Laan and Rubin [5, 6, 7, 8, 9]. For estimation of the mean outcome under a fixed treatment regime, TMLE requires specification of both the treatment model and a conditional model for the outcome. TMLE is doubly robust meaning that only one of these two models has to be correctly specified, in the sense that it must correctly model the outcome or treatment as a function of a sufficient set of covariates, in order to obtain unbiased effect estimates [8]. Furthermore, TMLE has been shown to be locally efficient, meaning that the resulting estimator has minimum large-sample standard error when both models are correctly specified. For these reasons, TMLE is an appealing method to reduce bias due to model misspecification and to possibly increase precision of the effect estimate.

Simulation studies have been used to evaluate the performance of TMLE in certain study designs. For example, simulation has been applied in the context of time-to-event analysis considering different types of censoring [10, 11]. Simulation-based investigations have also been conducted in longitudinal data settings where the performance of TMLE was compared to that of other methods in the presence of time-varying confounding [12, 13, 14, 15]. Porter [16] conducted simulations in the common setting of a point-exposure study with binary outcome; we were interested in extending these results to administrative data with a large (e. g., >200) number of covariates, which we will refer to as “high-dimensional”. We were also interested in determining whether the inclusion of “pure” treatment predictors that are independent of the outcome in the treatment model would be harmful in this context, and whether the inclusion of “pure” predictors of the outcome would be beneficial [17, 18, 19].

Pharmacoepidemiologic studies are often conducted using administrative claims and clinical data that are routinely collected for non-research purposes by insurers and governments. Since these data are not research data, control for confounding may require a large number of covariates. In this setting, TMLE has not yet been widely implemented (although software for TMLE has been developed [20, 21]). This lack of application may be due to the method’s novelty and theoretical complexity. However, before applying the TMLE method in such studies, its performance should be fully assessed. One key assumption that may be particularly problematic for this type of data is the positivity assumption [22, 23]. The theoretical positivity assumption requires that the probability of receiving any level of treatment conditional on the covariates must be positive for each individual in the population. Near-, or practical, violations of the positivity assumption can occur when a specific combination of covariates and treatment is rare. Near violations of the positivity assumption and TMLE have also been investigated, in particular for study settings with a continuous outcome [24, 25] and TMLE has been implemented in several settings with binary outcomes [9, 26, 27]. However, for high-dimensional administrative data with binary outcomes, the consequences of a near positivity violation and TMLE have been less widely investigated and discussed. Collaborative TMLE [25, 28, 29] has been proposed to address positivity violations by variable selection for the treatment model; however, we limit this paper to the more widely used standard TMLE.

In this article, we review the conceptual idea and implementation of TMLE in a general point-exposure study with binary exposure and outcome. A comprehensive Monte-Carlo simulation study is conducted to evaluate the performance of TMLE compared to the IPW estimator and to verify its double robustness property. Moreover, a plasmode simulation study [30] incorporating real administrative data is used to assess the performance of TMLE in a high dimensional covariate setting emulating a typical pharmacoepidemiologic study.

## 2 Notation and assumptions

Assume that X is a (perhaps high-dimensional) vector of covariates and A and Y are binary indicator variables for the treatment status and the observed outcome, respectively. Let YA=a indicate a patient’s potential outcome under treatment A=a. In order to perform valid causal inference using IPW or TMLE, several necessary assumptions are needed. First, the so-called time ordering assumption is required: the covariates X precede treatment A and A precedes Y in time, and Y depends on A and X while A depends only on X. Taking only pre-treatment variables in X guarantees that one does not condition on collider variables (common effects of treatment and other covariates) which would potentially induce selection bias [31]. Second, the consistency assumption requires that an individual’s potential outcome under the treatment he or she actually received is equal to his or her observed outcome; YA=a= Y|A=a. Third, the conditional exchangeability assumption (also known as no unmeasured confounding) has to be made. Confounding occurs when Y and A share a common cause. The conditional exchangeability assumption states that the potential outcomes are independent of the observed treatment given measured covariates; YA=aA |X, that is, that X includes sufficient variables to account for common causes [32]. Last, as noted earlier, the positivity assumption is also necessary for all treatment levels a and observed covariate realizations x. Violations of this assumption can be either theoretical, when P(A=a|X=x) = 0, or practical, when $\stackrel{ˆ}{P}\left(A=a|\mathbf{X}=\mathbf{x}\right)=0$ for some observed combination of a and x.

## 3 Inverse probability weighting

In point-exposure studies, the IPW estimator can be used to estimate marginal treatment effects in the presence of confounding. Each subject is assigned a weight equal to the inverse of the estimated probability of having received his or her own treatment, i. e. the inverse of the propensity score for a treated subject and the inverse of one minus the propensity score for an untreated subject. By weighting, a pseudo-population is created where the distribution of covariates is comparable across treatment groups [1]. Therefore, contrasts in marginal outcome between the treatment groups in this pseudo-population produce an unbiased estimate of the marginal treatment effect. A weighted unadjusted logistic regression for the outcome will produce an estimate of the marginal treatment effect. Including additional covariates (such as confounders or other risk factors) in the logistic model will produce an estimate of the conditional treatment effect. Due to the non-collapsibility of the odds ratio, these marginal and conditional effects are different quantities [33].

## 4 Targeted maximum likelihood estimation

In brief, in a point-exposure study with binary exposure and outcome, implementing TMLE for the estimation of the marginal odds ratio consists of three steps [9, 20]. To estimate this odds ratio, we simultaneously target P(YA=a=1) for a=0,1. Consistent estimation of the odds ratio requires that both of these parameters be consistently estimated.

Step 1: Initially, two probability models are specified and estimated based on the observed data: i) a model to predict the outcome given the treatment A and the covariates X (outcome model) and ii) a model to predict receipt of treatment A=a conditional on covariates X (treatment or propensity score model). Standard logistic regression is one possible and commonly used approach for fitting these two models; machine learning tools such as Super Learner may also be used [34]. Step 2: A parametric update model is fit using an intercept free logistic regression including “clever covariates” and an offset equal to the fitted linear prediction from the initial outcome model. The clever covariate is derived based on the efficient influence function for the parameter of interest, and is defined as $\frac{I\left(A=a\right)}{P\left(A=a|\mathbf{X}=\mathbf{x}\right)}$ in the binary case under consideration. This covariate can be estimated from the treatment model in step 1. The coefficient of the clever covariate is referred to as a fluctuation parameter. Consequently, the probabilities of the counterfactual outcomes are estimated with the predictions from the update model by setting everyone’s exposure to A=a. This step aims to find a better model parameterizations to target an optimal bias-variance tradeoff for the parameters of interest P(YA=a=1). Step 3: The empirical mean of the predicted probabilities of the counterfactual outcome (computed in the previous step) is taken. The standard error of TMLE can be again estimated with the efficient influence function. The Wald-type 95 % confidence interval and the statistics for hypothesis testing can be calculated correspondingly. Once estimates of P(YA=a=1) are calculated for both treatment levels, one can combine them to obtain an estimate of the marginal odds ratio: $\frac{P\left({Y}_{A=1}=1\right)/P\left({Y}_{A=1}=0\right)}{P\left({Y}_{A=0}=1\right)/P\left({Y}_{A=0}=0\right)}$. More detailed description of the theory of TMLE can be found in the literature [7, 9, 20].

## 5.1 Simulated data generation

In order to evaluate the performance of TMLE compared to that of the IPW estimator as well as to demonstrate the double robustness of TMLE, different model specifications were employed for both approaches. In all cases, we were interested in estimating the marginal odds ratio. We considered generating four types of binary covariates: a confounding variable (C) related to both the treatment and the outcome, an outcome predictor not related to the treatment (baseline risk predictor, BR), a treatment predictor not related to the outcome (instrumental variable, IV), and a noise variable (NV) neither related to the outcome nor to the treatment. These covariates were generated from Bernoulli distributions with probabilities equal 0.5. The treatment (A) was generated based on C and IV, with sampling probability $P\left(A=1|C,IV\right)=\frac{exp\left({\mathrm{\beta }}_{{0}_{A}}+{\mathrm{\beta }}_{{C}_{A}}C+{\mathrm{\beta }}_{I{V}_{A}}IV\right)}{1+exp\left({\mathrm{\beta }}_{{0}_{A}}+{\mathrm{\beta }}_{{C}_{A}}C+{\mathrm{\beta }}_{I{V}_{A}}IV\right)}$. The intercept ${\mathrm{\beta }}_{{0}_{\mathrm{A}}}$ was chosen to make the prevalence of treatment approximately 0.5. Analogously, we generated the outcome (Y) as a function of A, C and BR with probability $\mathrm{P}\left(\mathrm{Y}=1|\mathrm{A},\mathrm{C},\mathrm{B}\mathrm{R}\right)=\frac{\mathrm{e}\mathrm{x}\mathrm{p}\left({\mathrm{\beta }}_{{0}_{\mathrm{Y}}}+{\mathrm{\beta }}_{{\mathrm{A}}_{\mathrm{Y}}}\mathrm{A}+{\mathrm{\beta }}_{{\mathrm{C}}_{\mathrm{Y}}}\mathrm{C}+{\mathrm{\beta }}_{\mathrm{B}{\mathrm{R}}_{\mathrm{Y}}}\mathrm{B}\mathrm{R}\right)}{1+\mathrm{e}\mathrm{x}\mathrm{p}\left({\mathrm{\beta }}_{{0}_{\mathrm{Y}}}+{\mathrm{\beta }}_{{\mathrm{A}}_{\mathrm{Y}}}\mathrm{A}+{\mathrm{\beta }}_{{\mathrm{C}}_{\mathrm{Y}}}\mathrm{C}+{\mathrm{\beta }}_{\mathrm{B}{\mathrm{R}}_{\mathrm{Y}}}\mathrm{B}\mathrm{R}\right)}$. The intercept ${\mathrm{\beta }}_{{0}_{\mathrm{Y}}}$ was chosen to obtain outcome prevalence of approximately 0.25. To consider two different scenarios, the coefficients (${\mathrm{\beta }}_{{\mathrm{C}}_{\mathrm{A}}},{\mathrm{\beta }}_{{\mathrm{C}}_{\mathrm{Y}}},{\mathrm{\beta }}_{\mathrm{I}{\mathrm{V}}_{\mathrm{A}}},{\mathrm{\beta }}_{\mathrm{B}{\mathrm{R}}_{\mathrm{Y}}}$) for the respective covariates and treatment were set to $ln\left(2\right)$ and subsequently (in a second scenario) to $ln\left(5\right)$. The target of our analysis was the marginal odds ratio (OR), i. e., $\frac{P\left({Y}_{A=1}=1\right)/P\left({Y}_{A=1}=0\right)}{P\left({Y}_{A=0}=1\right)/P\left({Y}_{A=0}=0\right)}$. The true values of the marginal odds ratios were 1.94 and 3.80 respectively in our two simulated scenarios, derived from simulations where both counterfactual outcomes were generated for every subject. A total of 100,000 subjects were considered and corresponding parameter settings were used for the two scenarios. The average estimates of the marginal odds ratios were then determined by taking the mean contrasts of the counterfactual outcomes in the logit scale over 1,000 simulations.

Sample size was set at $\mathrm{n}\in \left\{500;1000;10,000\right\}$. To investigate the impact of omitting different types of variables in either the treatment or the outcome model, we assessed the performance of TMLE in all $16×16=256$ possible combinations of treatment and outcome model specifications, excluding interactions (Table 1). We compared the performance of IPW estimation with that of the TMLE with an unadjusted outcome model for each of the 16 possible treatment model specifications. Stabilized weights for IPW were estimated using weighted logistic regression. Simulation of each data scenario was repeated 1,000 times and analyzed accordingly: we reported the empirical bias, standard error, mean squared error of the estimates achieved by the TMLE and IPW approach.

Table 1:

Model specifications in simulation study A.

We will further refer to this simulation as “simulation study A”.

## 5.2 Simulation results

Table 2 show the bias and the root mean squared error (RMSE) for the estimates (ln OR) of the treatment effect for selected TMLE approaches, for the setting with sample size n=10,000 and model coefficients ${\mathrm{\beta }}_{{\mathrm{C}}_{\mathrm{A}}},{\mathrm{\beta }}_{{\mathrm{C}}_{\mathrm{Y}}},{\mathrm{\beta }}_{\mathrm{I}{\mathrm{V}}_{\mathrm{A}}},{\mathrm{\beta }}_{\mathrm{B}{\mathrm{R}}_{\mathrm{Y}}},{\mathrm{\beta }}_{{\mathrm{A}}_{\mathrm{Y}}}$=$\mathrm{l}\mathrm{n}\left(5\right)$.

Table 2:

Bias and RMSE of treatment effect estimates using various outcome models and treatment models with sample size n=10,000, ${\mathrm{\beta }}_{{C}_{E}},{\mathrm{\beta }}_{{C}_{Y}},{\mathrm{\beta }}_{I{V}_{E}},{\mathrm{\beta }}_{B{R}_{Y}},{\mathrm{\beta }}_{{E}_{Y}}$=$ln\left(5\right)$.

With the confounding variable C included in the outcome models (M2, M6, M7, M8, M12, M13, M14, and M16), TMLE always yielded unbiased treatment effect estimates regardless of the treatment model. When neither the confounder (C) nor the instrumental variable (IV) was included in the outcome models (M1, M3, M5, and M10), TMLE yielded unbiased estimates as long as the confounder (C) was included in the treatment models (T2, T6, T7, T8, T12, M13, T14, and T16). The estimates were biased with the other treatment models. In particular, inclusion of the IV in models without C (T4, T9, T11, T15) produced the largest bias. When the outcome models included IV but not C (M4, M9, M11 and M15), even when controlling for C in the treatment model (T2, T6, T8, T13), TMLE produces a small amount of finite-sample bias. Only the joint inclusion of C and IV in the treatment model (T7, T12, T14 and T16), i. e. modeling the treatment generating process correctly, produces an unbiased estimate of the effect in this finite sample. In other words, TMLE appeared to take longer to converge when the treatment model was estimated conditional only on confounders rather than the true set of generating variables when the outcome model was overfit to the instrumental variables. The left panel of Figure 3 displays the bias for TMLE with outcome model M4.

The results for RMSE are presented for the TMLE with outcome model M2, M1 and M4 in the corresponding right panel of the figures. The RMSE is minimal for TMLE providing C is included in the outcome model. Among the TMLEs with the outcome models M1, M3, M5 and M10 (in which both C and IV were excluded), estimators with treatment models that include both C and IV (T7, T12, T14 and T16) tend to yield slightly larger RMSE compared to those that include C but not IV. The RMSE is maximal when the treatment models include IV but not C (treatment model T4, T9, T11, T15). When the outcome models include IV but not C (M4, M9, M11 and M15), the RMSE is minimal with the joint inclusion of C and IV in the treatment model (T7, T12, T14 and T16) within TMLE.

In this simulation study, the IPW estimator had the same bias and RMSE when compared to TMLE with the same propensity score model specification and a marginal outcome model with only treatment included as a covariate (using a marginal, or otherwise limited, outcome model is not recommended, as it eliminates the opportunity for correct specification of the outcome model; however, we include these results for completeness). Moreover, adding the baseline risk (BR) variable in either the treatment or the outcome model did not result in obvious improvement for TMLE.

The results are similar for the other parameter setting (${\mathrm{\beta }}_{{C}_{E}},{\mathrm{\beta }}_{{C}_{Y}},{\mathrm{\beta }}_{I{V}_{E}},{\mathrm{\beta }}_{B{R}_{Y}},{\mathrm{\beta }}_{{E}_{Y}}$=$\mathrm{l}\mathrm{n}\left(2\right)$, n={500; 1,000}), although the contrasts between different models were less distinct for the smaller coefficients (results not shown).

## 6.1 Data sources

To further evaluate the performance of TMLE and IPW in high dimensional covariate settings with respect to estimation bias and precision, simulation studies were conducted to mimic a typical pharmacoepidemiologic study with known data generating process. Given the challenge of generating realistic high dimensional data, we used a plasmode simulation study [30]. A plasmode study starts with an existing cohort, so that associations between covariates reflect real-world patients. We then injected known signals (effects) into it. The true simulated effect was 0.72, a value based on a real word study of the effect of the statin use post-myocardial infarction (MI) on one-year risk of all-cause mortality. We will further refer to this simulation as “simulation study B”.

This is a retrospective population-based cohort study using the data from the Clinical Practice Research Datalink (CPRD) and Hospital Episode Statistics (HES). 32,792 patients aged 18 and older, and diagnosed as MI were drawn from the databases between April 1st, 1998 and March 31st, 2012. This cohort consists of 19,122 patients treated with statin and 13,671 patients not treated with statin within 30 days after the diagnosis of MI. All-cause mortality was evaluated as any death recorded in the databases during the one year follow-up period. A range of known potential confounders can be predefined from the study, including demographic characteristics (e. g., age, sex), time variables (e. g., year of cohort entry), clinical characteristics (e. g., smoking, alcohol use, obesity), comorbidities (e. g., diabetes mellitus, atrial fibrillation, coronary artery disease (recorded >30 days before the index MI), acute coronary syndrome, cerebrovascular disease, congestive heart failure, chronic obstructive pulmonary disease, hypertension, hypercholesterolemia, peripheral vascular disease, previous coronary revascularization, previous stroke, previous MI (recorded >30 days before the index MI), and previous medications prescribed (e. g., aspirin, angiotensin-converting enzyme (ACE) inhibitors, angiotensin receptor blockers (ARBs), beta-blockers, calcium-channel blockers, diuretics, fibrates, non-steroidal anti-inflammatory drugs (NSAIDs)). We also constructed variables for the number of prescriptions issued and the number of hospitalizations in the previous year, which are two proxies for overall health. Age, number of hospitalizations, and prescription count were categorized into groups and they were considered as dummy variables along with year of cohort entry. Additionally, 400 empirical covariates were identified as proxies for unmeasured confounding via the high-dimensional propensity score (hdPS) algorithm [36, 37, 38, 39, 40, 41]. This study (protocol number: 14_018) was approved by the Independent Scientific Advisory Committee of the CPRD and the Research Ethics Board of the Jewish General Hospital (Montreal, Quebec).

## 6.2 Exposure and outcome generation

The covariates from the real data example were directly applied in the simulations and the coefficients estimated from the real data were used as our parameter setting. The treatment and outcome were stochastically generated based on the estimated coefficients as retrieved from the real study data analysis. We let all the measured confounding covariates and the 400 empirical HDPS-selected variables be the complete set of confounding covariates (denoted as C). The treatment (denoted as A) was generated, similarly to the first simulation study, conditional on C. The parameter values for the intercept (${\mathrm{\beta }}_{{0}_{A}}$) and the covariates ${\mathrm{\beta }}_{{\mathrm{C}}_{\mathrm{A}}}$ were set as the corresponding estimated coefficients from the real data fitting a binary logistic propensity score model with response variable A and explanatory variables C. Therefore, the treatment variable (A) was generated from a Bernoulli distribution with $\mathrm{P}\left(\mathrm{A}=1|\mathrm{C}\right)=\frac{\mathrm{e}\mathrm{x}\mathrm{p}\left({\mathrm{\beta }}_{{0}_{\mathrm{A}}}+{\mathrm{\beta }}_{{\mathrm{C}}_{\mathrm{A}}}^{{}^{\prime }}\mathrm{C}\right)}{1+\mathrm{e}\mathrm{x}\mathrm{p}\left({\mathrm{\beta }}_{{0}_{\mathrm{A}}}+{\mathrm{\beta }}_{{\mathrm{C}}_{\mathrm{A}}}^{{}^{\prime }}\mathrm{C}\right)}$. Likewise, in the outcome model, the intercept ${\mathrm{\beta }}_{{0}_{\mathrm{Y}}}$, the coefficient of $\mathrm{A}$ (the conditional log odds ratio), and the coefficient of C ($\left({\mathrm{\beta }}_{{\mathrm{C}}_{\mathrm{Y}}}\right)$) were fixed at the corresponding estimates of the coefficients from the real study. Finally, we generated the outcome variable based on the previously generated A and the covariates C, with the probability $P\left(Y=1|A,C\right)=\frac{exp\left({\mathrm{\beta }}_{{0}_{Y}}+{\mathrm{\beta }}_{{A}_{Y}}A+{\mathrm{\beta }}_{{C}_{Y}}{}^{\prime }C\right)}{1+exp\left({\mathrm{\beta }}_{{0}_{Y}}+{\mathrm{\beta }}_{{A}_{Y}}A+{\mathrm{\beta }}_{{C}_{Y}}{}^{\prime }C\right)}$. The true marginal effect (OR) was derived from a contrast of the two marginal potential outcome probabilities, which were computed by using the true values of the parameters ${\mathrm{\beta }}_{{0}_{\mathrm{Y}}}$, ${\mathrm{\beta }}_{{\mathrm{A}}_{\mathrm{Y}}}$ and ${\mathrm{\beta }}_{{\mathrm{C}}_{\mathrm{Y}}}$ and simulating exposed and unexposed counterfactual outcomes for all subjects in the population. A varying number of covariates were defined by four nested covariate sets (Table 3) inducing different levels of confounding adjustment as well as different degrees of non-positivity. The null covariate set included no variables. The simple covariate set (C1) included several predefined important confounders, including age, sex, obesity, smoking and history of diabetes. The intermediate covariate set (C2) included C1 and a variety of additional potential confounders defined in the last section. Finally, the full covariate set (C3) included all pre-specified potential confounders (C2) as well as 400 empirical variables selected by the HDPS algorithm. Sixteen TMLE and four IPW analyses were performed in this simulation, considering different model specifications (Table 3) for the treatment model in both approaches and also for the outcome model in the TMLE. In addition, to assess the methods under a more practical strategy with respect to extreme propensity score, all the analyses were repeated with truncation of the propensity score at 0.025 and 0.975. We set the sample size to the number of subjects in the real study (n=32,792). Due to computational intensity, the simulation and the analyses were limited to 500 generated datasets.

Table 3:

Nested covariates for defining model specifications for TMLE and IPW estimator.

## 6.3 Simulation results

The true marginal effect (OR) of the treatment in this simulation study is equal to −0.333. Based on the 443 simulation runs where all the TMLE and IPW estimators converged (out of a total of 500), the bias, the empirical standard error and the root mean squared error (RMSE) for the estimates of the marginal effects without truncation of the propensity score are presented in Table 4.

Table 4:

Results for various IPW and TMLE approaches in the plasmode simulation study.

In the treatment generating process, 445 coefficients were directly applied from the real data example. The intercept was set to −4.55 and most of the coefficients were very small. The maximum coefficient was 2.1, and the minimum and maximum of the probability of treatment were 0.0006 and 0.9998 respectively. Therefore, full positivity violations did not occur. However, in the analyses of the simulated data, extreme estimated propensity scores were present when all the confounders (C3) were included in the treatment model. Over the 500 simulation runs, 5.8 % of propensity scores were smaller than 0.025, while 14.1 % of propensity score were larger than 0.975. Near violations of positivity often occurred in the IPW estimator and TMLE when treatment models included the full covariate set. The distribution of the estimated propensity score and weights from all the simulations is presented in Table 5 and Table 6.

Table 5:

Distribution of propensity score estimated from four different treatment models.

Table 6:

Distributions of the stabilized weights from four different propensity score models.

Figure 1:

(a)-(b) Distributions of the estimates from IPW estimator and TMLE in the plasmode simulation study. Figure 1a is for models with non-truncated weights; Figure 1b uses weights truncated at the 2.5th and 97.5th percentiles.

Results with truncation of the propensity score are presented in Table 4 and the corresponding boxplots for the distributions of the estimates are presented in Figure 1b. Compared to the analyses without truncation, slightly more bias but better precision for IPW estimators that adjust for a large number of covariates (IPW3 and IPW4) was attained by truncation. On the other hand, the standard error as well as the bias were both largely reduced for TMLE that includes the full covariate set C3 in the treatment model. Moreover, as was expected, all TMLE models with truncation of the propensity score avoided the numerical problems previously observed in the non-truncated case. In this simulation with truncation, TMLE with a marginal outcome model again yielded slightly larger bias and standard error compared to the IPW estimator when high dimensional covariates were considered.

## 7 Discussion

TMLE produces semi-parametric efficient estimators of marginal causal effects in a population. In this paper we have focused on the application of TMLE and its comparison to the IPW estimator in both low and high dimensional covariate settings. We used parametric models for the initial estimation of the probabilities of the outcome and treatment respectively. This was done to simplify the fitting process in a high-dimensional setting, but alternative (non-parametric) methods are also p. For example, cross-validation based machine learning approaches, such as the Deletion/Substitution/Addition (DSA) algorithm, can search over a large space of polynomial generalized linear models [42, 43]; Alternatively, the Super Learner can be used to combine predictions over a set of candidate algorithms [34, 44, 45].

Our comprehensive simulation study investigated several combinations of the treatment and outcome model specifications conditional on the baseline variables. We demonstrated the double robustness property of TMLE in basic finite sample settings. Double robustness enables consistent treatment effect estimation when either the propensity score or the outcome model is correctly specified on the full set of confounders. The IPW estimator and TMLE using an unadjusted outcome model performed equivalently in the absence of a near violation of positivity when using the same propensity score model. This result is perhaps not surprising, given that the two methods use the same information (the treatment model). When TMLE included a rich outcome model, bias was reduced.

Indeed, the positivity assumption was satisfied in simulation study A. The estimated propensity scores were always bounded between 0.07 and 0.93 in all the simulations. However, both methods experienced instability in a high dimensional setting when a large number of covariates were included in the propensity score model. Near practical violations of the positivity assumption caused problems when the propensity score model was over-adjusted. However, diagnostics for the propensity score should be performed in high dimensional settings. When near positivity violations are detected, TMLE’s performance can be improved by truncation of the propensity score or by including the high dimensional covariates in the outcome model while keeping only a few covariates in the treatment model.

In the plasmode simulation studies (simulation study B), we showed that as the result of a near violation of the positivity assumption, extreme propensity scores, although correctly modeled, can lead to poor precision and biased estimates for both IPW and TMLE. They can even produce numerical instability for TMLE (specifically, prediction of infinite values). Positivity is an essential assumption and should be always verified in any analysis. Examination of the distributions of the propensity score and the weights is therefore recommended for both IPW and TMLE. The bias induced by such violations can be further assessed by using parametric bootstrap and simulation [46]. Furtherm TMLE and the IPW estimator perform differently in this setting even with the same modeling approach due to their different sensitivity to practical positivity violation. Although inverse weighting by the propensity score is used in both approaches, the involved estimating process is different. Within the IPW estimation approach, the propensity score is used to define the weights for creation of the pseudo-population. Large weights, however, can result if an extreme predicted probability (close to 0 or 1) does not correspond with the actual observed treatment status of an individual. Such inflated weights can have high impact on the analysis results thus leading to poor precision and/or biased effect estimates. In the TMLE procedure, the same inverse weights are embedded in the clever covariate, which enables estimation of parameters in the updating procedure, allowing for the subsequent computation of individual counterfactual outcome prediction. However, since the estimation equation is evaluated for each subject under both treatment and no treatment conditions, any extreme propensity score (close to 0 or 1) inflates the value of the clever covariates, and consequently causes unstable parameter estimation. Moreover, an extreme propensity score also has influence in estimating the parameter in the logistic parametric update model, consequently numerical non-convergence can occur due to a very unstable fluctuation parameter estimate.

Truncation of the propensity score may prevent poor precision with the price of minor bias [47], and data-adaptive approaches have been proposed to select the level of truncation level minimizes the expected mean square error [48, 49]. There is an option in the TMLE function in R to specify an amount of truncation to avoid extreme estimated probabilities of receiving treatment [20, 21]. We provided results from additional analyses in which the propensity score was truncated at 0.025 and 0.975. The results indicate that the influence of the extreme propensity score can be controlled by truncation. Therefore, it should be carried out in practice especially for a high dimensional propensity score model. In our truncation analysis, IPW and TMLE (with a marginal outcome model) again performed differently. This again reflect the difference in the use of propensity score in these two estimating procedures. Collaborative TMLE [28] is an extension of TMLE that allows for the data-adaptive selection of covariates into the propensity score, potentially avoiding the non-convergence and variance inflation that we encountered. This is another potential solution for high-dimensional data analysis with near-positivity violations [8].

Generalizability of the results and conclusions is a potential limitation in our plasmode simulation study. We did not assign a broad range of covariate distributions or parameter values. The covariates were instead derived directly from a real-world setting. The parameter values and the sample size are derived from the real-world study and then fixed for the simulation. Therefore, our simulation results are not necessarily representative of other settings and we cannot draw definitive conclusions about the comparison of the performance of TMLE and IPW estimators. However, this specific data setting is typical in a pharmacoepidemiologic study and has already provided useful insights and evaluation regarding the performance of TMLE in the high dimensional scenario.

In conclusion, TMLE and IPW estimators both use inverse propensity score weighting, but can perform differently. Both TMLE and IPW can be sensitive to violations of the positivity assumption; near-violations of this assumption are possible in high-dimensional covariate settings, and inference should be interpreted with caution. In such settings, TLME with a high-dimensional outcome model and a reduced treatment model may be a better alternative. Collaborative TMLE may be a useful alternative in these settings; further exploration of these methods is warranted.

## Acknowledgements

This work was supported by the Canadian Network for Observational Drug Effect Studies (CNODES). CNODES, a collaborating centre of the Drug Safety and Effectiveness Network (DSEN), is funded by the Canadian Institutes of Health Research (CIHR). MP holds a studentship from the Fonds de Recherche du Québec – Santé (FQR-S). KBF and MES hold New Investigator awards from CIHR. RWP is a Chercheur-national (National Scholar) of the FQR-S and holds the Albert Boehringer I Chair in Pharmacoepidemiology.

## References

• 1. Robins JM, Hernan MA, Brumback B. Marginal structural models and causal inference in epidemiology. Epidemiology 2000;11(5):550–560.

• 2. Robins J. A new approach to causal inference in mortality studies with a sustained exposure period – application to control of the healthy worker survivor effect. Math Model 1986;7(9):1393–1512.

• 3. Snowden JM, Rose S, Mortimer KM. Implementation of G-computation on a simulated data set: demonstration of a causal inference technique. Am J Epidemiol 2011 Apr 1;173(7):731–738.

• 4. Scharfstein DO, Rotnitzky A, Robins JM. Adjusting for nonignorable drop-out using semiparametric nonresponse models. J Am Stat Assoc 1999;94(448):1096–1120.

• 5. Van der Laan MJ. Targeted maximum likelihood based causal inference: Part I. Int J Biostat 2010;6(2):1557–4679. doi:.

• 6. Van der Laan MJ. Targeted maximum likelihood based causal inference: Part II. Int J Biostat 2010;6(2):1557–4679. doi:.

• 7. Rosenblum M, van der Laan MJ. Targeted maximum likelihood estimation of the parameter of a marginal structural model. Int J Biostat 2010;6(2):1557–4679. doi:.

• 8. Schnitzer ME, Lok JJ, Gruber S. Variable selection for confounder control, flexible modeling and collaborative targeted minimum loss-based estimation in causal inference. Int J Biostat 2016;12(1):97–115.

• 9. Moore KL, van der Laan MJ. Covariate adjustment in randomized trials with binary outcomes: targeted maximum likelihood estimation. Stat Med 2009;28(1):39–64.

• 10. Stitelman OM, De Gruttola V, van der Laan MJ. A general implementation of tmle for longitudinal data applied to causal inference in survival analysis. Int J Biostat 2010;8(1):1557–4679. doi:.

• 11. Moore KL, van der Laan MJ. Application of time-to-event methods in the assessment of safety in clinical trials 2009;U.C. Berkeley Division of Biostatistics Working Paper Series. Working Paper 248.

• 12. Schnitzer ME, Moodie EE, Platt RW. Targeted maximum likelihood estimation for marginal time-dependent treatment effects under density misspecification. Biostatistics 2013;14(1):1–14.

• 13. van der Laan MJ, Gruber S. Targeted minimum loss based estimation of causal effects of multiple time point interventions. Int J Biostat 2012;8(1):1557–4679. doi:.

• 14. Petersen M, Schwab J, Gruber S, Blaser N, Schomaker M, van der Laan M. Targeted maximum likelihood estimation for dynamic and static longitudinal marginal structural working models. J Causal Inference 2014 Sep 1;2(2):147–185.

• 15. Schnitzer ME, Laan MJVD, Moodie EEM, Platt RW. Effect of breastfeeding on gastrointestinal infection in infants: a targeted maximum likelihood approach for clustered longitudinal data. Ann Appl Stat 2014 Jun;8(2):703–725.

• 16. Porter KE. The relative performance of targeted maximum likelihood estimators under violations of the positivity assumption 2011;Available at http://escholarship.org/uc/item/3hp4r33n.pdf. Google Scholar

• 17. Brookhart MA, Schneeweiss S, Rothman KJ, Glynn RJ, Avorn J, Stürmer T. Variable selection for propensity score models. Am J Epidemiol 2006 Jun 15;163(12):1149–1156.

• 18. Austin PC, Mamdani MM. A comparison of propensity score methods: a case-study estimating the effectiveness of post-AMI statin use. Stat Med 2006 Jun 30;25(12):2084–2106.

• 19. Lefebvre G, Delaney JAC, Platt RW. Impact of mis-specification of the treatment model on estimates from a marginal structural model. Stat Med 2008 Aug 15;27(18):3629–3642.

• 20. Gruber S, van der Laan MJ. tmle: an R Package for Targeted Maximum Likelihood Estimation. J Stat Softw 2011;51(13):1–35. Google Scholar

• 21. Schwab JL, Lendle S, Petersen M, van der Laan MJ, Gruber S. LTMLE: longitudinal targeted maximum likelihood estimation, 2013 2014;Available at http://cran.r-project.org/web/packages/ltmle/index.html. Google Scholar

• 22. Neugebauer R, van der Laan M. Why prefer double robust estimators in causal inference? J Stat Plan Inference 2005;129(1):405–426.

• 23. Ertefaie A, Stephens DA. Comparing approaches to causal inference for longitudinal data: Inverse probability weighting versus propensity scores. Int J Biostat 2010;6(2):1557–4679. doi:.

• 24. Gruber S, van der Laan MJ. A targeted maximum likelihood estimator of a causal effect on a bounded continuous outcome. Int J Biostat 2010;6(1):1557–4679. doi:.

• 25. Porter KE, Gruber S, van der Laan MJ, Sekhon JS. The relative performance of targeted maximum likelihood estimators. Int J Biostat 2011;7(1):1–34.

• 26. Lendle SD, Fireman B, Laan MJVD. Targeted maximum likelihood estimation in safety analysis. J Clin Epidemiol 2013 Aug 1;66(8):S91–98.

• 27. Brown DM, Petersen M, Costello S. Occupational exposure to PM2.5 and incidence of ischemic heart disease: longitudinal targeted minimum loss-based estimation. Epidemiology 2015;26(6):806–814.

• 28. Van Der Laan MJ, Gruber S. Collaborative double robust targeted maximum likelihood estimation. Int J Biostat 2010;6(1):1557–4679. doi:.

• 29. Gruber S, van der Laan MJ. An application of collaborative targeted maximum likelihood estimation in causal inference and genomics. Int J Biostat 2010;6(1):1557–4679. doi:.

• 30. Franklin JM, Schneeweiss S, Polinski JM, Rassen JA. Plasmode simulation for the evaluation of pharmacoepidemiologic methods in complex healthcare databases. Comput Stat Data Anal 2014 Apr;72:219–226.

• 31. Greenland S, Pearl J, Robins JM. Causal diagrams for epidemiologic research. Epidemiology 1999;10(1):37–48.

• 32. Hernán MA, Robins JM. Causal Inference. Boca Raton: Chapman & Hall/CRC, 2016, forthcoming. Google Scholar

• 33. Pang M, Kaufman JS, Platt RW. Studying noncollapsibility of the odds ratio with marginal structural and logistic regression models. Stat Methods Med Res 2013;0962280213505804.

• 34. Van Der Laan MJ, Polley EC, Hubbard AE. Super learner. Stat Appl Genet Mol Biol 2007;6(1):1544–6115. doi:.

• 35. Pang M, Schuster T, Filion KB, Eberg M, Platt RW. Targeted Maximum Likelihood Estimation for Pharmacoepidemiologic Research. Epidemiology 2016 7;27(4):570–577. doi:.

• 36. Schneeweiss S, Rassen JA, Glynn RJ, Avorn J, Mogun H, Brookhart MA. High-dimensional propensity score adjustment in studies of treatment effects using health care claims data. Epidemiology 2009;20(4):512–522.

• 37. Stürmer T, Schneeweiss S, Brookhart MA, Rothman KJ, Avorn J, Glynn RJ. Analytic strategies to adjust confounding using exposure propensity scores and disease risk scores: nonsteroidal antiinflammatory drugs and short-term mortality in the elderly. Am J Epidemiol 2005;161(9):891–898.

• 38. Rassen JA, Schneeweiss S. Using high-dimensional propensity scores to automate confounding control in a distributed medical product safety surveillance system. Pharmacoepidemiol Drug Saf 2012;21(S1):41–49.

• 39. Rassen JA, Glynn RJ, Brookhart MA, Schneeweiss S. Covariate selection in high-dimensional propensity score analyses of treatment effects in small samples. Am J Epidemiol 2011;173(12):1404–1413.

• 40. Rassen JA, Avorn J, Schneeweiss S. Multivariate-adjusted pharmacoepidemiologic analyses of confidential information pooled from multiple health care utilization databases. Pharmacoepidemiol Drug Saf 2010;19(8):848–857.

• 41. Rassen JA, Doherty M, Huang W, Schneeweiss S. Pharmacoepidemiology toolbox Available at: . Boston MAhttp://www.hdpharmacoepi.org.

• 42. Sinisi SE, van der Laan MJ. Loss-based cross-validated deletion/substitution/addition algorithms in estimation 2004;Available at: http://biostats.bepress.com/ucbbiostat/paper103/. Google Scholar

• 43. Sinisi SE, van der Laan MJ. Deletion/substitution/addition algorithm in learning with applications in genomics. Stat Appl Genet Mol Biol 2004;3(1):1069. Google Scholar

• 44. Sinisi SE, Polley EC, Petersen ML, Rhee S-Y, van der Laan MJ. Super learning: an application to the prediction of HIV-1 drug resistance. Stat Appl Genet Mol Biol 2007;6(1):7. Google Scholar

• 45. Zheng W, Laan MVD. Asymptotic theory for cross-validated targeted maximum likelihood estimation 2010;Available at: http://works.bepress.com/wenjing-zheng/22/. Google Scholar

• 46. Petersen ML, Porter KE, Gruber S, Wang Y, van der Laan MJ. Diagnosing and responding to violations in the positivity assumption. Stat Methods Med Res 2010;0962280210386207.

• 47. Cole SR, Hernán MA. Constructing inverse probability weights for marginal structural models. Am J Epidemiol 2008 Sep 15;168(6):656–664.

• 48. Xiao Y, Moodie EEM, Abrahamowicz M. Comparison of approaches to weight truncation for marginal structural cox models. Epidemiol Methods 2013 Jan 8;2(1):1–20.

• 49. Bembom O, Laan MVD. Data-adaptive selection of the truncation level for Inverse-Probability-of-Treatment-Weighted estimators 2008;U.C. Berkeley Division of Biostatistics Working Paper Series. Working Paper 230. Available at: . http://biostats.bepress.com/ucbbiostat/paper230.

Published Online: 2016-12-04

Published in Print: 2016-11-01

Canadian Institutes of Health Research, (Grant / Award Number: ‘DSE-111845’)

Citation Information: The International Journal of Biostatistics, Volume 12, Issue 2, 20150034, ISSN (Online) 1557-4679, ISSN (Print) 2194-573X,

Export Citation

© 2016 Walter de Gruyter GmbH, Berlin/Boston.