# Abstract

In order to obtain concrete results, we focus on estimation of the treatment specific mean, controlling for all measured baseline covariates, based on observing independent and identically distributed copies of a random variable consisting of baseline covariates, a subsequently assigned binary treatment, and a final outcome. The statistical model only assumes possible restrictions on the conditional distribution of treatment, given the covariates, the so-called propensity score. Estimators of the treatment specific mean involve estimation of the propensity score and/or estimation of the conditional mean of the outcome, given the treatment and covariates. In order to make these estimators asymptotically unbiased at any data distribution in the statistical model, it is essential to use data-adaptive estimators of these nuisance parameters such as ensemble learning, and specifically super-learning. Because such estimators involve optimal trade-off of bias and variance w.r.t. the infinite dimensional nuisance parameter itself, they result in a sub-optimal bias/variance trade-off for the resulting real-valued estimator of the estimand. We demonstrate that additional targeting of the estimators of these nuisance parameters guarantees that this bias for the estimand is second order and thereby allows us to prove theorems that establish asymptotic linearity of the estimator of the treatment specific mean under regularity conditions. These insights result in novel targeted minimum loss-based estimators (TMLEs) that use ensemble learning with additional targeted bias reduction to construct estimators of the nuisance parameters. In particular, we construct collaborative TMLEs (C-TMLEs) with known influence curve allowing for statistical inference, even though these C-TMLEs involve variable selection for the propensity score based on a criterion that measures how effective the resulting fit of the propensity score is in removing bias for the estimand. As a particular special case, we also demonstrate the required targeting of the propensity score for the inverse probability of treatment weighted estimator using super-learning to fit the propensity score.

## 1 Introduction and overview

This introduction provides an atlas for the contents of this article. It starts with formulating the role of estimation of nuisance parameters to obtain asymptotically linear estimators of a target parameter of interest. This demonstrates the need to target this estimator of the nuisance parameter in order to make the estimator of the target parameter asymptotically linear when the model for the nuisance parameter is large. The general approach to obtain such a targeted estimator of the nuisance parameter is described. Subsequently, we present our concrete example to which we will apply this general method for targeted estimation of the nuisance parameter, and for which we establish a number of formal theorems. Finally, we discuss the link to previous articles that concerned some kind of targeting of the estimator of the nuisance parameter, and we provide an organization of the remainder of the article.

### 1.1 The role of nuisance parameter estimation

Suppose we observe *n* independent and identically distributed copies of a random variable *O* with probability distribution

The empirical mean of the influence curve

Suppose that *P* through a parameter *P* only through

The latter is shown as follows. By the property of the canonical gradient (in fact, any gradient) we have

The first term is an empirical process term that, under empirical process conditions (mentioned below), equals

To obtain the desired asymptotic linearity of

### 1.2 Targeting the fit of the nuisance parameter: general approach

In this article, we demonstrate that if

The current article concerns the construction of such targeted IPTW and TMLE that are asymptotically linear under regularity conditions, even when only one of the nuisance parameters is consistent and the estimators of the nuisance parameters are highly data adaptive. In order to be concrete in this article, we will focus on a particular example. In such an example we can concretely present the second-order term

The same approach for construction of such TMLE can be carried out in much greater generality, but that is beyond the scope of this article. Nonetheless, it is helpful for the reader to know that the general approach is the following (considering the case that *Q* can be misspecified): (1) approximate

### 1.3 Concrete example covered in this article

Let us now formulate our concrete example we will cover in this article. Let *W* baseline covariates, *A* a binary treatment, and *Y* a final outcome. Let *A*, given *W*, but leaves the marginal distribution of *W* and the conditional distribution of *Y*, given *P* is given by *W*, and note that *P* through

For this particular example, such TMLE are presented in Scharfstein et al. [17]; van der Laan and Rubin [7]; Bembom et al. [18–21]; Rosenblum and van der Laan [22]; Sekhon et al. [23]; van der Laan and Rose [6, 24]. Since

The first term equals

However, if only one of these nuisance parameter estimators is consistent, then the second term is still a first-order term, and it remains to establish that it is also asymptotically linear with a second-order remainder. For sake of discussion, suppose that

In this article, we present TMLE that targets

### 1.4 Relation to current literature on targeted nuisance parameter estimators

The construction of TMLE that utilizes targeting of the nuisance parameter

The TMLEs presented in this article are always iterative and thereby rely on convergence of the iterative updating algorithm. Since the empirical risk increases at each updating step, such convergence is typically guaranteed by the existence of the MLE at each updating step (e.g. an MLE of coefficient in a logistic regression). Either way, in this article, we assume this convergence to hold. Since our assumptions of our theorems require

### 1.5 Organization

The organization of this paper is as follows. In Section 2, we introduce a targeted IPTW-estimator that relies on an adaptive consistent estimator of

In Section 5, we extend the TMLE of Section 3 (that relies on *g* but one that suffices for consistent estimation of

### 1.6 Notation

In the following sections, we will use the following notation. We have *A*, given *W*. Let *W* under

## 2 Statistical inference for IPTW-estimator when using super-learning to fit treatment mechanism

We first describe an IPTW-estimator that uses super-learning to fit the treatment mechanism

### 2.1 An IPTW-estimator using super-learning to fit the treatment mechanism

We consider a simple IPTW-estimator *np*, and let

as the choice of estimator that minimizes cross-validated risk. The super-learner of

### 2.2 Asymptotic linearity of a targeted data-adaptive IPTW-estimator

The next theorem presents an IPTW-estimator that uses a targeted fit

**Theorem 1***We consider a targeted IPTW-estimator**where**and**is an update of an initial estimator**of**defined below*.

**Definition of targeted estimator***Let**be obtained by non-parametric estimation of the regression function**treating**as a fixed covariate (i.e. function of W). This yields an estimator**of**where**. Consider the submodel**and fit**with the MLE*

*We define**as the corresponding targeted update of**. This TMLE**satisfies*

**Empirical process condition**: *Assume that**fall in a**-Donsker class with probability tending to 1*.

**Negligibility of second-order terms**: *Define**. Assume**with probability tending to 1 and assume*

*Then*,

*where*

So under the conditions of this theorem, we can construct an asymptotic 0.95-confidence interval

and

Regarding the displayed second-order term conditions, we note that these are satisfied if

Regarding the empirical process condition, we note that an example of a Donsker class is the class of multivariate real-valued functions with uniform sectional variation norm bounded by a universal constant [44]. It is important to note that if each estimator in the library falls in such a class, then also the convex combinations fall in that same class [4]. So this Donsker condition will hold if it holds for each of the candidate estimators in the library of the super-learner.

### 2.3 Comparison of targeted data-adaptive IPTW and an IPTW using parametric model

Consider an IPTW-estimator using a MLE *q*. As a consequence, all the consistency and second-order term conditions for the IPTW-estimator using a targeted *K* main terms that themselves have a uniform sectional variation norm, but also penalized least-squares estimators (e.g. Lasso) using basis functions with bounded uniform sectional variation norm, and one could map any estimator into this space of functions with universally bounded uniform sectional variation norm through a smoothing operation. Thus, under this restriction on the library, the IPTW-estimator using the super-learner is asymptotically linear with influence curve

The parametric IPTW-estimator is asymptotically linear with influence curve *f* onto

For example, if the parametric model happens to have a score equal to

If, on the other hand, the parametric model is misspecified, then the IPTW-estimator using

## 3 Statistical inference for TMLE when using super-learning to consistently fit treatment mechanism

In the next subsection, we present a TMLE that targets the fit of the treatment mechanism, analog to the targeted IPTW-estimator presented above. In addition, this subsection presents a formal asymptotic linearity theorem demonstrating that this TMLE will be asymptotically linear even when

### 3.1 Asymptotic linearity of a TMLE using a targeted estimator of the treatment mechanism

The following theorem presents a novel TMLE and corresponding asymptotic linearity with specified influence curve, where we rely on consistent estimation of

**Theorem 2**

**Iterative targeted MLE of**

**Definitions**: *Given**let**be a consistent estimator of the regression**of**on**and**. Let**be an initial estimator of*

**Initialization**: *Let**and**. Let*

**Updating step for***Consider the submodel**and fit**with the MLE*

*We define**as the corresponding update of**. This**satisfies*

**Updating step for***Let**be the quasi-log-likelihood loss function for**(allowing that Y is continuous in**). Consider the submodel**and let**. Define**as the resulting update. Define*

**Iterating till convergence**: *Now, set**and iterate this updating process mapping a**into**till convergence or till large enough K so that the estimating equations (2) below are solved up till an**-term. Denote the limit of this iterative procedure with*

**Plug-in estimator**: *Let**where**is the empirical distribution estimator of**. The TMLE of**is defined as*

**Estimating equations solved by TMLE**: *This TMLE**solves*

**Empirical process condition**: *Assume that**falls in a**-Donsker class with probability tending to 1 as*

**Negligibility of second-order terms**: *Define*

*where**is treated as a fixed covariate (i.e. function of W) in the conditional expectation**. Assume that there exists a**so that**with probability tending to 1, and*

*Then*,

*where*

Thus, under the assumptions of this theorem, an asymptotic 0.95-confidence interval is given by

### 3.2 Using a δ -specific submodel for targeting *g* that guarantees the positivity condition

The following is an application of the constrained logistic regression approach of the type presented in Gruber and van der Lann [19] for the purpose of estimation of *W*, which implies an estimator *W*, through a given estimator

The MLE is simply obtained with logistic regression of *W* (see, e.g. Gruber and van der Lann [19]) based on the quasi-log-likelihood loss function:

where

is the quasi-log-likelihood loss. The update *k* in the iterative TMLE algorithm, and thereby that

## 4 Double robust statistical inference for TMLE when using super-learning to fit outcome regression and treatment mechanism

In this section, our aim is to present a TMLE that is asymptotically linear with known influence curve if either

**Theorem 3**

**Definitions**: *For any given**let**and**be consistent estimators of**and**respectively (e.g. using a super-learner or other non-parametric adaptive regression algorithm). Let**and**denote these estimators applied to the TMLEs**defined below*.

**Iterative targeted MLE of**

**Initialization**: *Let**be an initial estimator of**. Let**and let**. Let**be obtained by non-parametrically regressing A on**. Let**be obtained by non-parametrically regressing**on*

**Updating step**: *Consider the submodel**and fit**with the MLE*

*Define the submodel**where*

*Let**be the MLE, where**is the quasi-log-likelihood loss*.

*We define**as the corresponding targeted update of**and**as the corresponding update of**. Let**and*

**Iterate till convergence**: *Now, set**and iterate this updating process mapping a**into**till convergence or till large enough K so that the following three estimating equations are solved up till an**-term*:

*where*

**Final substitution estimator**: *Denote the limits of this iterative procedure with**. Let**where**is the empirical distribution estimator of**. The TMLE of**is defined as*

**Equations solved by TMLE**:

**Empirical process condition**: *Assume that**fall in a**-Donsker class with probability tending to 1 as*

**Negligibility of second-order terms**: *Define**and**. Assume that there exists a**so that**with probability tending to 1, that**are consistent for**w.r.t*. *-norm, where either**or**and assume that the following second-order terms are*

*Then*,

*where*

Note that consistent estimation of the influence curve

If *A*, given some function *W* for which

As shown in the final remark of the Appendix, the condition of Theorem 3 that either

## 5 Collaborative double robust inference for C-TMLE when using super-learning to fit outcome regression and reduced treatment mechanism

We first review the theoretical underpinning for collaborative estimation of nuisance parameters, in this case, the outcome regression and treatment mechanism. Subsequently, we explain that the desired collaborative estimation can be achieved by applying the previously established template for construction of a C-TMLE to a TMLE that solves certain estimating equations when given an initial estimator of *k*, and (2) using cross-validation to select the *k* for which

### 5.1 Motivation and theoretical underpinning of collaborative double robust estimation of nuisance parameters

We note that

Let *A*, given *W*, and let *A* given *W*. We define the set

**Lemma 1***(van der Laan and Gruber [**33**]) If**and**then**. More generally*,

We note that *A*, given *W* through *g* for which

### 5.2 C-TMLE

The general C-TMLE introduced in van der Laan and Gruber [33] provides a template for construction of a TMLE *W* that is an element of

The general C-TMLE has been implemented and applied to point treatment and longitudinal data [20, 29–33, 35]. A C-TMLE algorithm relies on a TMLE algorithm that maps an initial *k*, and finally uses cross-validation to select the best TMLE among these candidates estimators of

### 5.3 A TMLE that allows for collaborative double robust inference

Our next theorem presents a TMLE algorithm and a corresponding influence curve under the assumption that the propensity score correctly adjusts for the possibly misspecified

**Theorem 4**

**Definitions**: *For any given**let**and**be consistent estimators of**and**respectively (e.g. using a super-learner or other non-parametric adaptive regression algorithm). Let**and**denote these estimators applied to the TMLE**defined below*.

**“Score” equations the TMLE should solve**: *Below, we describe an iterative TMLE algorithm that results in estimators**that solve the following equations*:

**Iterative targeted MLE of**

**Initialization**: *Let**and**(e.g. aiming to adjust for**) be initial estimators*.

*Let**and*

**Updating step**: *Consider the submodel**and fit**with the MLE*

*Define the submodel**and let**be the quasi-log-likelihood loss function for**. Let**be the MLE. Let**and*

**Iterating till convergence**: *Now, set**and iterate this updating process mapping a**into**till convergence or till large enough K so that the following estimating equations are solved up till an**-term*:

**Final substitution estimator**: *Denote these limits (in k) of this iterative procedure with**. Let**where**is the empirical distribution estimator of**. The TMLE of**is defined as*

**Assumption on limits****of***Assume that**is consistent for**w.r.t*. *-norm, where**for some function**of W for which**only depends on W through**and assume that**where the latter holds, in particular, if**only depends on W through**(e.g*. *involves non-parametric adjustment by**). As a consequence, we have*

**Empirical process condition**: *Assume that**fall in a**-Donsker class with probability tending to 1 as*

**Negligibility of second-order terms**: *Define*

*Assume that the following conditions hold for each of the following possible definitions of**. Note that**is the limit of each of these choices for*

*We assume**are bounded away from**with probability tending to one, and*

*Then*,

*where*

Thus, consistency of this TMLE relies upon the consistency of *A*, given *W* through

It is also interesting to note that the algebraic form of the influence curve of this TMLE is identical to the influence curve of the TMLE of Theorem 2 that relied on

### 5.4 A C-TMLE algorithm

The TMLE algorithm presented in Theorem 4 maps an initial estimator *g* was that it should non-parametrically adjust not only for *g* in response to

First, we compute a set of *K* univariate covariates *W*, which we will refer to as main terms, even though a term could be an interaction term or a super-learning fit of the regression of *A* on a subset of the components of *W*. Let

The general template of a C-TMLE algorithm is the following: given a TMLE algorithm that maps any initial *k* main terms, where each set *k*. Here increasingly non-parametric means that the empirical mean of the loss function of the fit is decreasing in *k*. This sequence *g*-fit implied by *k* among these candidate TMLEs

In order to present a precise C-TMLE algorithm we will first introduce some notation. For a given subset of main terms *g* does not measure how well *g* fits *g* (and as initial estimator

Given a set

where we remind the reader of the definition

The C-TMLE algorithm defined below generates a sequence *k*-specific TMLE, increases in *k*, one unit at a time: *k*, and thereby to select the TMLE *k*-specific TMLEs

**Initiate algorithm: Set initial TMLE**. Let

**Determine next TMLE**. Determine the next best main term to add:

If

then

[In words: If the next best main term added to the fit of

**Iterate**. Run this from *K* at which point

This sequence of candidate TMLEs *k* and *k*, *k*. For that purpose we use *V*-fold cross-validation. That is, for each of the *V* splits of the sample in a training and validation sample, we apply the above algorithm for generating a sequence of candidate estimates *k* we take the average over the *V* splits of the *k*-specific performance measure over the validation sample, which is called the cross-validated risk of the *k*-specific TMLE. We select the *k* that has the best cross-validated risk, which we denote with

**Fast version of above C-TMLE**: We could carry out the above C-TMLE algorithm but replacing the TMLE that maps an initial

**Statistical inference for C-TMLE**: Let

The asymptotic variance of

## 6 Discussion

Targeted minimum loss-based estimation allows us to construct plug-in estimators

However, we noted that this level of targeting is insufficient if one only relies on consistency of

In this article we also pushed this additional level of targeting to a new level by demonstrating how it allows for double robust statistical inference, and that even if we estimate the nuisance parameter in a complicated manner that is based on a criterion that cares about how it helps the estimator to fit

It remains to evaluate the practical benefit of the modifications of IPTW, TMLE, and C-TMLE as presented in this article for both estimation and assessment of uncertainty. We plan to address this in future research.

Even though we focussed in this article on a particular concrete estimation problem, TMLE is a general tool and our TMLE and theorems can be generalized to general statistical models and path-wise differentiable statistical target parameters.

We note that this targeting of nuisance parameter estimators in the TMLE is not only necessary to get a known influence curve but also necessary to make the TMLE asymptotically linear. So it does not simply suffice to run a bootstrap as an alternative of influence curve based inference, since the bootstrap can only work if the estimator is asymptotically linear so that it has an existing limit distribution. In addition, the established asymptotic linearity with known influence curve has the important by-product that one now obtains statistical inference with no extra computational cost. This is particularly important in these large semi-parametric models that require the utilization of aggressive machine learning methods in order to cover the model-space, making the estimators by necessity very computer intensive, so that a (disputable) bootstrap method might simply be too computer extensive.

# Acknowledgments

This research was supported by an NIH grant R01 AI074345-06. The author is grateful for the excellent, helpful, and insightful comments of the reviewers.

## Appendix

### Proof of Theorem 1

To start with we note:

The first term of this decomposition yields the first component

By our assumptions, the last term

So it remains to study:

Note that this equals *g*. Our strategy is to first approximate this parameter by an easier (still unknown) parameter

**Lemma 2***Define**and**where**is treated as a fixed function of W when calculating the conditional expectation. Assume*

*Then*,

**Proof of Lemma 2**: Note that

Since we assumed *Y* on the initial estimator

The next step of the proof is the following series of equalities

where, by assumption,

Thus, we have

from which we deduce that, by Lemma 2 and

where we defined

By our assumptions,

### Proof of Theorem 2

One easily checks that

because *Q* of

The first term *A* equals

where

where

By our assumptions, the second term above is

The estimator

We have

where

where we defined

We have that

### Proof of Theorem 3

As outlined in Section 1, we have

if

It suffices to analyze the second term. Initially, we note that

where

By assumption,

Now, we note

By our assumptions, the first term

So it suffices to analyze the second and third terms of this last expression. In order to represent the second and third terms we define

The sum of the second and third terms can now be represented as:

For notational convenience, we will suppress the dependence of these mappings on the unknown quantities, and thus use

**Analysis of****if**

By our assumptions,

so that it remains to analyze

where, by our assumptions,

In addition,

where

**Analysis of****if**

Here we used that *A* allowing us to first replace *A* and then retake the conditional expectation but now only conditioning on what is needed to fix all other terms within expectation w.r.t. *W* by the easier *Y*. The last term is a second-order term involving square differences

where we assumed that

in probability. This proves

### Proof of Theorem 4

As in the proof of previous theorem, we start with

where we use that

As in the proof of previous theorem, we decompose this second term as follows:

resulting in four terms, which we will denote with Terms 1–4. We will now analyze these four terms.

**Term 1**: The first term

**Term 4**: Due to our assumption that

where, by assumption,

We proceed as follows:

The first term is asymptotically equivalent with minus Term 3, which shows that Term 3 is canceled out by a component of Term 4 up till a second-order term that is

where

By assumption, *W*. Therefore,

This term is analyzed below and it is shown that this term equals

To conclude, we have then shown that the fourth term equals the latter expression minus the third term.

We now analyze (4) which can be represented as

We now proceed as follows:

For the second term

by noting that

By assumption, both terms are

Since, by construction of

where

**Term 3**: Our analysis of Term 4 showed that Term 3 cancels out and thus that the sum of the third and fourth terms equals

**Analysis of Term 2**: Up till a second-order term that can be bounded by

where

We have

Recall that, by our assumption,

This proves that

**Remark: Proof of additional result** In this analysis of Term 2, we assumed

where *A* on

where

### References

1. BickelPJ, KlaassenCA, RitovY, WellnerJ. Efficient and adaptive estimation for semiparametric models. Springer-Verlag, 1997.Search in Google Scholar

2. GillRD. Non- and semiparametric maximum likelihood estimators and the von Mises method (part 1). Scand J Stat1989;16:97–128.Search in Google Scholar

3. GillRD, van der LaanMJ, WellnerJA. Inefficient estimators of the bivariate survival function for three models. Ann Inst Henri Poincaré1995;31:545–97.Search in Google Scholar

4. van der VaartAW, WellnerJA. Weak convergence and empirical processes. New York: Springer-Verlag, 1996.10.1007/978-1-4757-2545-2Search in Google Scholar

5. van der LaanMJ. Estimation based on case-control designs with known prevalence probability. Int J Biostat2008. Available at: http://www.bepress.com/ijb/vol4/iss1/17/.10.2202/1557-4679.1114Search in Google Scholar PubMed

6. van der LaanMJ, RoseS. Targeted learning: causal inference for observational and experimental data. New York: Springer, 2012.Search in Google Scholar

7. van der LaanMJ, RubinD. Targeted maximum likelihood learning. Int J Biostat2006;20.10.2202/1557-4679.1043Search in Google Scholar

8. van der LaanMJ, DudoitS. Unified cross-validation methodology for selection among estimators and a general cross-validated adaptive epsilon-net estimator: finite sample oracle inequalities and examples. Technical report, Division of Biostatistics, University of California, Berkeley, CA, November 2003.Search in Google Scholar

9. van der LaanMJ, PolleyE, HubbardA. Super learner. Stat Appl Genet Mol Biol2007;6:Article 25.10.2202/1544-6115.1309Search in Google Scholar PubMed

10. van der VaartAW, DudoitS, van der LaanMJ. Oracle inequalities for multi-fold cross-validation. Stat Decis2006;240:351–71.10.1524/stnd.2006.24.3.351Search in Google Scholar

11. RobinsJM, RotnitzkyA. Recovery of information and adjustment for dependent censoring using surrogate markers. In Aids epidemiology. Methodological issues. Basel: Bikhäuser, 1992:297–331.10.1007/978-1-4757-1229-2_14Search in Google Scholar

12. RobinsJM, RotnitzkyA. Semiparametric efficiency in multivariate regression models with missing data. J Am Stat Assoc1995;900:122–9.10.1080/01621459.1995.10476494Search in Google Scholar

13. van der LaanMJ, RobinsJM. Unified methods for censored longitudinal data and causality. New York: Springer-Verlag, 2003.10.1007/978-0-387-21700-0Search in Google Scholar

14. RobinsJM, RotnitzkyA, van der LaanMJ. Comment on “on profile likelihood” by S.A. Murphy and A.W. van der Vaart. J Am Stat Assoc – Theory Methods2000;450:431–5.Search in Google Scholar

15. RobinsJM. Robust estimation in sequentially ignorable missing data and causal inference models. In Proceedings of the American Statistical Association, 2000.Search in Google Scholar

16. RobinsJM, RotnitzkyA. Comment on the Bickel and Kwon article, “inference for semiparametric models: some questions and an answer”. Stat Sin2001;110:920–36.Search in Google Scholar

17. ScharfsteinDO, RotnitzkyA, RobinsJM. Adjusting for non-ignorable drop-out using semiparametric nonresponse models, (with discussion and rejoinder). J Am Stat Assoc1999;940:1096–120 (1121–46).Search in Google Scholar

18. BembomO, PetersenML, RheeS-Y, FesselWJ, SinisiSE, ShaferRW, et al. Biomarker discovery using targeted maximum likelihood estimation: application to the treatment of antiretroviral resistant HIV infection. Stat Med2009;28:152–72.10.1002/sim.3414Search in Google Scholar PubMed PubMed Central

19. GruberS, van der LaanMJ. A targeted maximum likelihood estimator of a causal effect on a bounded continuous outcome. Int J Biostat2010;6:Article 26. Available at: www.bepress.com/ijb/vol6/iss1/2610.2202/1557-4679.1260Search in Google Scholar PubMed PubMed Central

20. GruberS, van der LaanMJ. An application of collaborative targeted maximum likelihood estimation in causal inference and genomics. Int J Biostat2010;60.10.2202/1557-4679.1182Search in Google Scholar PubMed PubMed Central

21. GruberS, van der LaanMJ. A targeted maximum likelihood estimator of a causal effect on a bounded continuous outcome. Technical Report 265, UC Berkeley, CA, 2010.10.2202/1557-4679.1260Search in Google Scholar PubMed PubMed Central

22. RosenblumM, van der LaanMJ. Targeted maximum likelihood estimation of the parameter of a marginal structural model. Int J Biostat2010;60.10.2202/1557-4679.1238Search in Google Scholar PubMed PubMed Central

23. SekhonJS, GruberS, PorterK, van der LaanMJ. Propensity-score-based estimators and C-TMLE. In: MJvan der Laan and SRose, editors. Targeted learning: prediction and causal inference for observational and experimental data, chapter 21. New York: Springer, 2011.Search in Google Scholar

24. GruberS, van der LaanMJ. Targeted minimum loss based estimation of a causal effect on an outcome with known conditional bounds. Int J Biostat2012;8.10.1515/1557-4679.1413Search in Google Scholar PubMed

25. ZhengW, van der LaanMJ. Asymptotic theory for cross-validated targeted maximum likelihood estimation. Technical Report 273, Division of Biostatistics, University of California, Berkeley, CA, 2010.10.2202/1557-4679.1181Search in Google Scholar PubMed PubMed Central

26. ZhengW, van der LaanMJ. Cross-validated targeted minimum loss based estimation. In: MJvan der Laan and SRose, editors. Targeted learning: causal inference for observational and experimental data, chapter 21. New York: Springer, 2011:459–74.Search in Google Scholar

27. van der VaartAW. Asymptotic statistics. New York: Cambridge University Press, 1998.Search in Google Scholar

28. RotnitzkyA, LeiQ, SuedM, RobinsJ. Improved double-robust estimation in missing data and causal inference models. Biometrika2012;99:439–56.10.1093/biomet/ass013Search in Google Scholar PubMed PubMed Central

29. GruberS, van der LaanMJ. Targeted minimum loss based estimator that outperforms a given estimator. Int J Biostat2012;80:Article 11. DOI:10.1515/1557-4679.1332Search in Google Scholar

30. GruberS, van der LaanMJ. Marginal structural models. In: MJvan der Laan and SRose, editors. C-TMLE of an additive point treatment effect, chapter 19. New York: Springer, 2011.Search in Google Scholar

31. PorterKE, GruberS, van der LaanMJ, SekhonJS. The relative performance of targeted maximum likelihood estimators. Int J Biostat2011;70:1–34.10.2202/1557-4679.1308Search in Google Scholar PubMed PubMed Central

32. StitelmanOM, van der LaanMJ. Collaborative targeted maximum likelihood for time to event data. Int J Biostat2010:Article 21.10.2202/1557-4679.1249Search in Google Scholar PubMed

33. van der LaanMJ, GruberS. Collaborative double robust penalized targeted maximum likelihood estimation. Int J Biostat2010;60.10.2202/1557-4679.1181Search in Google Scholar

34. van der LaanMJ, RoseS. Targeted learning: prediction and causal inference for observational and experimental data. New York: Springer, 2011.10.1007/978-1-4419-9782-1Search in Google Scholar

35. WangH, RoseS, van der LaanMJ. Finding quantitative trait loci genes. In: MJvan der Laan and SRose, editors. Targeted learning: causal inference for observational and experimental data, chapter 23. New York: Springer, 2011.Search in Google Scholar

36. HernanMA, BrumbackB, RobinsJM. Marginal structural models to estimate the causal effect of zidovudine on the survival of HIV-positive men. Epidemiology 2000;110:561–70.10.1097/00001648-200009000-00012Search in Google Scholar PubMed

37. GyörfiL, KohlerM, KrzyżakA, WalkH. A distribution-free theory of nonparametric regression. New York: Springer-Verlag, 2002.Search in Google Scholar

38. van der LaanMJ, DudoitS, van der VaartAW. The cross-validated adaptive epsilon-net estimator. Stat Decis2006;240:373–95.10.1524/stnd.2006.24.3.373Search in Google Scholar

39. van der LaanMJ, DudoitS, KelesS. Asymptotic optimality of likelihood-based cross-validation. Stat Appl Genet Mol Biol2004;3:Article 4.10.2202/1544-6115.1036Search in Google Scholar PubMed

40. DudoitS, van der LaanMJ. Asymptotics of cross-validated risk estimation in estimator selection and performance assessment. Stat Methodol2005;20:131–54.10.1016/j.stamet.2005.02.003Search in Google Scholar

41. PolleyEC, RoseS, van der LaanMJ. Super learning. In: MJvan der Laan and SRose, editors. Targeted learning: causal inference for observational and experimental data, chapter 3. New York: Springer, 2011.Search in Google Scholar

42. PolleyEC, van der LaanMJ. Super learner in prediction. Technical report 200. Division of Biostatistics, UC Berkeley, Working Paper Series, 2010.Search in Google Scholar

43. van der LaanMJ, PetersenML. Targeted learning. In: ZhangC, MaY, editors. Ensemble machine learning. New York: Springer, 2012:117–56. ISBN 978-1-4419-9326-7.Search in Google Scholar

44. van der LaanMJ. Efficient and inefficient estimation in semiparametric models. Center for Mathematics and Computer Science, CWI-tract 114. 1996.10.1214/aos/1032894470Search in Google Scholar

45. LeeBK, LesslerJ, StuartEA. Improved propensity score weighting using machine learning. Stat Med2009;29:337–46.10.1002/sim.3782Search in Google Scholar PubMed PubMed Central

46. SchneeweissS, RassenJA, GlynnRJ, AvornJ, MogunH, BrookhartMA. High-dimensional propensity score adjustment in studies of treatment effects using health care claims data. Epidemiology2009;20:512–22. DOI: 10.1097/EDE.0b013e3181a663cc.10.1097/EDE.0b013e3181a663ccSearch in Google Scholar PubMed PubMed Central

47. VansteelandtS, BekaertM, ClaeskensG. On model selection and model misspecification in causal inference. Stat Methods Med Res2010;21:7–30. DOI:10.1177/0962280210387717.10.1177/0962280210387717Search in Google Scholar PubMed

48. WestreichD, ColeSR, FunkMJ, BrookhartMA, SturmerT. The role of the c-statistic in variable selection for propensity scores. Pharmacoepidemiol Drug Saf2011;20:317–20.10.1002/pds.2074Search in Google Scholar PubMed PubMed Central

49. van der LaanMJ, GruberS. Collaborative double robust penalized targeted maximum likelihood estimation. Int J Biostat2009;6.10.2202/1557-4679.1181Search in Google Scholar PubMed PubMed Central

**Published Online:**2014-2-11

**Published in Print:**2014-5-1

© 2014 by Walter de Gruyter Berlin / Boston