1 Introduction and overview
This introduction provides an atlas for the contents of this article. It starts with formulating the role of estimation of nuisance parameters to obtain asymptotically linear estimators of a target parameter of interest. This demonstrates the need to target this estimator of the nuisance parameter in order to make the estimator of the target parameter asymptotically linear when the model for the nuisance parameter is large. The general approach to obtain such a targeted estimator of the nuisance parameter is described. Subsequently, we present our concrete example to which we will apply this general method for targeted estimation of the nuisance parameter, and for which we establish a number of formal theorems. Finally, we discuss the link to previous articles that concerned some kind of targeting of the estimator of the nuisance parameter, and we provide an organization of the remainder of the article.
1.1 The role of nuisance parameter estimation
Suppose we observe n independent and identically distributed copies of a random variable O with probability distribution
The empirical mean of the influence curve
Suppose that
The latter is shown as follows. By the property of the canonical gradient (in fact, any gradient) we have
The first term is an empirical process term that, under empirical process conditions (mentioned below), equals
To obtain the desired asymptotic linearity of
1.2 Targeting the fit of the nuisance parameter: general approach
In this article, we demonstrate that if
The current article concerns the construction of such targeted IPTW and TMLE that are asymptotically linear under regularity conditions, even when only one of the nuisance parameters is consistent and the estimators of the nuisance parameters are highly data adaptive. In order to be concrete in this article, we will focus on a particular example. In such an example we can concretely present the second-order term
The same approach for construction of such TMLE can be carried out in much greater generality, but that is beyond the scope of this article. Nonetheless, it is helpful for the reader to know that the general approach is the following (considering the case that
1.3 Concrete example covered in this article
Let us now formulate our concrete example we will cover in this article. Let
For this particular example, such TMLE are presented in Scharfstein et al. [17]; van der Laan and Rubin [7]; Bembom et al. [18–21]; Rosenblum and van der Laan [22]; Sekhon et al. [23]; van der Laan and Rose [6, 24]. Since
The first term equals
However, if only one of these nuisance parameter estimators is consistent, then the second term is still a first-order term, and it remains to establish that it is also asymptotically linear with a second-order remainder. For sake of discussion, suppose that
In this article, we present TMLE that targets
1.4 Relation to current literature on targeted nuisance parameter estimators
The construction of TMLE that utilizes targeting of the nuisance parameter
The TMLEs presented in this article are always iterative and thereby rely on convergence of the iterative updating algorithm. Since the empirical risk increases at each updating step, such convergence is typically guaranteed by the existence of the MLE at each updating step (e.g. an MLE of coefficient in a logistic regression). Either way, in this article, we assume this convergence to hold. Since our assumptions of our theorems require
1.5 Organization
The organization of this paper is as follows. In Section 2, we introduce a targeted IPTW-estimator that relies on an adaptive consistent estimator of
In Section 5, we extend the TMLE of Section 3 (that relies on
1.6 Notation
In the following sections, we will use the following notation. We have
2 Statistical inference for IPTW-estimator when using super-learning to fit treatment mechanism
We first describe an IPTW-estimator that uses super-learning to fit the treatment mechanism
2.1 An IPTW-estimator using super-learning to fit the treatment mechanism
We consider a simple IPTW-estimator
as the choice of estimator that minimizes cross-validated risk. The super-learner of
2.2 Asymptotic linearity of a targeted data-adaptive IPTW-estimator
The next theorem presents an IPTW-estimator that uses a targeted fit
Theorem 1We consider a targeted IPTW-estimator
Definition of targeted estimator
We define
Empirical process condition: Assume that
Negligibility of second-order terms: Define
Then,
where
So under the conditions of this theorem, we can construct an asymptotic 0.95-confidence interval
and
Regarding the displayed second-order term conditions, we note that these are satisfied if
Regarding the empirical process condition, we note that an example of a Donsker class is the class of multivariate real-valued functions with uniform sectional variation norm bounded by a universal constant [44]. It is important to note that if each estimator in the library falls in such a class, then also the convex combinations fall in that same class [4]. So this Donsker condition will hold if it holds for each of the candidate estimators in the library of the super-learner.
2.3 Comparison of targeted data-adaptive IPTW and an IPTW using parametric model
Consider an IPTW-estimator using a MLE
The parametric IPTW-estimator is asymptotically linear with influence curve
For example, if the parametric model happens to have a score equal to
If, on the other hand, the parametric model is misspecified, then the IPTW-estimator using
3 Statistical inference for TMLE when using super-learning to consistently fit treatment mechanism
In the next subsection, we present a TMLE that targets the fit of the treatment mechanism, analog to the targeted IPTW-estimator presented above. In addition, this subsection presents a formal asymptotic linearity theorem demonstrating that this TMLE will be asymptotically linear even when
3.1 Asymptotic linearity of a TMLE using a targeted estimator of the treatment mechanism
The following theorem presents a novel TMLE and corresponding asymptotic linearity with specified influence curve, where we rely on consistent estimation of
Theorem 2
Iterative targeted MLE of
Definitions: Given
Initialization: Let
Updating step for
We define
Updating step for
Iterating till convergence: Now, set
Plug-in estimator: Let
Estimating equations solved by TMLE: This TMLE
Empirical process condition: Assume that
Negligibility of second-order terms: Define
where
Then,
where
Thus, under the assumptions of this theorem, an asymptotic 0.95-confidence interval is given by
3.2 Using a -specific submodel for targeting g that guarantees the positivity condition
The following is an application of the constrained logistic regression approach of the type presented in Gruber and van der Lann [19] for the purpose of estimation of
The MLE is simply obtained with logistic regression of
where
is the quasi-log-likelihood loss. The update
4 Double robust statistical inference for TMLE when using super-learning to fit outcome regression and treatment mechanism
In this section, our aim is to present a TMLE that is asymptotically linear with known influence curve if either
Theorem 3
Definitions: For any given
Iterative targeted MLE of
Initialization: Let
Updating step: Consider the submodel
Define the submodel
Let
We define
Iterate till convergence: Now, set
where
Final substitution estimator: Denote the limits of this iterative procedure with
Equations solved by TMLE:
Empirical process condition: Assume that
Negligibility of second-order terms: Define
Then,
where
Note that consistent estimation of the influence curve
If
As shown in the final remark of the Appendix, the condition of Theorem 3 that either
5 Collaborative double robust inference for C-TMLE when using super-learning to fit outcome regression and reduced treatment mechanism
We first review the theoretical underpinning for collaborative estimation of nuisance parameters, in this case, the outcome regression and treatment mechanism. Subsequently, we explain that the desired collaborative estimation can be achieved by applying the previously established template for construction of a C-TMLE to a TMLE that solves certain estimating equations when given an initial estimator of
5.1 Motivation and theoretical underpinning of collaborative double robust estimation of nuisance parameters
We note that
Let
Lemma 1(van der Laan and Gruber [33]) If
We note that
5.2 C-TMLE
The general C-TMLE introduced in van der Laan and Gruber [33] provides a template for construction of a TMLE
The general C-TMLE has been implemented and applied to point treatment and longitudinal data [20, 29–33, 35]. A C-TMLE algorithm relies on a TMLE algorithm that maps an initial
5.3 A TMLE that allows for collaborative double robust inference
Our next theorem presents a TMLE algorithm and a corresponding influence curve under the assumption that the propensity score correctly adjusts for the possibly misspecified
Theorem 4
Definitions: For any given
“Score” equations the TMLE should solve: Below, we describe an iterative TMLE algorithm that results in estimators
Iterative targeted MLE of
Initialization: Let
Let
Updating step: Consider the submodel
Define the submodel
Iterating till convergence: Now, set
Final substitution estimator: Denote these limits (in k) of this iterative procedure with
Assumption on limits
Empirical process condition: Assume that
Negligibility of second-order terms: Define
Assume that the following conditions hold for each of the following possible definitions of
We assume
Then,
where
Thus, consistency of this TMLE relies upon the consistency of
It is also interesting to note that the algebraic form of the influence curve of this TMLE is identical to the influence curve of the TMLE of Theorem 2 that relied on
5.4 A C-TMLE algorithm
The TMLE algorithm presented in Theorem 4 maps an initial estimator
First, we compute a set of K univariate covariates
The general template of a C-TMLE algorithm is the following: given a TMLE algorithm that maps any initial
In order to present a precise C-TMLE algorithm we will first introduce some notation. For a given subset of main terms
Given a set
where we remind the reader of the definition
The C-TMLE algorithm defined below generates a sequence
Initiate algorithm: Set initial TMLE. Let
Determine next TMLE. Determine the next best main term to add:
If
then
[In words: If the next best main term added to the fit of
Iterate. Run this from
This sequence of candidate TMLEs
Fast version of above C-TMLE: We could carry out the above C-TMLE algorithm but replacing the TMLE that maps an initial
Statistical inference for C-TMLE: Let
The asymptotic variance of
6 Discussion
Targeted minimum loss-based estimation allows us to construct plug-in estimators
However, we noted that this level of targeting is insufficient if one only relies on consistency of
In this article we also pushed this additional level of targeting to a new level by demonstrating how it allows for double robust statistical inference, and that even if we estimate the nuisance parameter in a complicated manner that is based on a criterion that cares about how it helps the estimator to fit
It remains to evaluate the practical benefit of the modifications of IPTW, TMLE, and C-TMLE as presented in this article for both estimation and assessment of uncertainty. We plan to address this in future research.
Even though we focussed in this article on a particular concrete estimation problem, TMLE is a general tool and our TMLE and theorems can be generalized to general statistical models and path-wise differentiable statistical target parameters.
We note that this targeting of nuisance parameter estimators in the TMLE is not only necessary to get a known influence curve but also necessary to make the TMLE asymptotically linear. So it does not simply suffice to run a bootstrap as an alternative of influence curve based inference, since the bootstrap can only work if the estimator is asymptotically linear so that it has an existing limit distribution. In addition, the established asymptotic linearity with known influence curve has the important by-product that one now obtains statistical inference with no extra computational cost. This is particularly important in these large semi-parametric models that require the utilization of aggressive machine learning methods in order to cover the model-space, making the estimators by necessity very computer intensive, so that a (disputable) bootstrap method might simply be too computer extensive.
This research was supported by an NIH grant R01 AI074345-06. The author is grateful for the excellent, helpful, and insightful comments of the reviewers.
Proof of Theorem 1
To start with we note:
The first term of this decomposition yields the first component
By our assumptions, the last term
So it remains to study:
Note that this equals
Lemma 2Define
Then,
Proof of Lemma 2: Note that
Since we assumed
The next step of the proof is the following series of equalities
where, by assumption,
Thus, we have
from which we deduce that, by Lemma 2 and
where we defined
By our assumptions,
Proof of Theorem 2
One easily checks that
because
The first term A equals
where
where
By our assumptions, the second term above is
The estimator
We have
where
where we defined
We have that
Proof of Theorem 3
As outlined in Section 1, we have
if
It suffices to analyze the second term. Initially, we note that
where
By assumption,
Now, we note
By our assumptions, the first term
So it suffices to analyze the second and third terms of this last expression. In order to represent the second and third terms we define
The sum of the second and third terms can now be represented as:
For notational convenience, we will suppress the dependence of these mappings on the unknown quantities, and thus use
Analysis of
By our assumptions,
so that it remains to analyze
where, by our assumptions,
In addition,
where
Analysis of
Here we used that
where we assumed that
in probability. This proves
Proof of Theorem 4
As in the proof of previous theorem, we start with
where we use that
As in the proof of previous theorem, we decompose this second term as follows:
resulting in four terms, which we will denote with Terms 1–4. We will now analyze these four terms.
Term 1: The first term
Term 4: Due to our assumption that
where, by assumption,
We proceed as follows:
The first term is asymptotically equivalent with minus Term 3, which shows that Term 3 is canceled out by a component of Term 4 up till a second-order term that is
where
By assumption,
This term is analyzed below and it is shown that this term equals
To conclude, we have then shown that the fourth term equals the latter expression minus the third term.
We now analyze (4) which can be represented as
We now proceed as follows:
For the second term
by noting that
By assumption, both terms are
Since, by construction of
where
Term 3: Our analysis of Term 4 showed that Term 3 cancels out and thus that the sum of the third and fourth terms equals
Analysis of Term 2: Up till a second-order term that can be bounded by
where
We have
Recall that, by our assumption,
This proves that
Remark: Proof of additional result In this analysis of Term 2, we assumed
where
where
References
- 1.↑
Bickel PJ, Klaassen CA, Ritov Y, Wellner J. Efficient and adaptive estimation for semiparametric models. Springer-Verlag, 1997.
- 2.↑
Gill RD. Non- and semiparametric maximum likelihood estimators and the von Mises method (part 1). Scand J Stat 1989;16:97–128.
- 3.
Gill RD, van der Laan MJ, Wellner JA. Inefficient estimators of the bivariate survival function for three models. Ann Inst Henri Poincaré 1995;31:545–97.
- 4.↑
van der Vaart AW, Wellner JA. Weak convergence and empirical processes. New York: Springer-Verlag, 1996.
- 5.↑
van der Laan MJ. Estimation based on case-control designs with known prevalence probability. Int J Biostat 2008. Available at: http://www.bepress.com/ijb/vol4/iss1/17/.
- 6.↑
van der Laan MJ, Rose S. Targeted learning: causal inference for observational and experimental data. New York: Springer, 2012.
- 8.↑
van der Laan MJ, Dudoit S. Unified cross-validation methodology for selection among estimators and a general cross-validated adaptive epsilon-net estimator: finite sample oracle inequalities and examples. Technical report, Division of Biostatistics, University of California, Berkeley, CA, November 2003.
- 10.↑
van der Vaart AW, Dudoit S, van der Laan MJ. Oracle inequalities for multi-fold cross-validation. Stat Decis 2006;240:351–71.
- 11.↑
Robins JM, Rotnitzky A. Recovery of information and adjustment for dependent censoring using surrogate markers. In Aids epidemiology. Methodological issues. Basel: Bikhäuser, 1992:297–331.
- 12.
Robins JM, Rotnitzky A. Semiparametric efficiency in multivariate regression models with missing data. J Am Stat Assoc 1995;900:122–9.
- 13.↑
van der Laan MJ, Robins JM. Unified methods for censored longitudinal data and causality. New York: Springer-Verlag, 2003.
- 14.
Robins JM, Rotnitzky A, van der Laan MJ. Comment on “on profile likelihood” by S.A. Murphy and A.W. van der Vaart. J Am Stat Assoc – Theory Methods 2000;450:431–5.
- 15.
Robins JM. Robust estimation in sequentially ignorable missing data and causal inference models. In Proceedings of the American Statistical Association, 2000.
- 16.↑
Robins JM, Rotnitzky A. Comment on the Bickel and Kwon article, “inference for semiparametric models: some questions and an answer”. Stat Sin 2001;110:920–36.
- 17.↑
Scharfstein DO, Rotnitzky A, Robins JM. Adjusting for non-ignorable drop-out using semiparametric nonresponse models, (with discussion and rejoinder). J Am Stat Assoc 1999;940:1096–120 (1121–46).
- 18.↑
Bembom O, Petersen ML, Rhee S-Y, Fessel WJ, Sinisi SE, Shafer RW, et al. Biomarker discovery using targeted maximum likelihood estimation: application to the treatment of antiretroviral resistant HIV infection. Stat Med 2009;28:152–72.
- 19.↑
Gruber S, van der Laan MJ. A targeted maximum likelihood estimator of a causal effect on a bounded continuous outcome. Int J Biostat 2010;6:Article 26. Available at: www.bepress.com/ijb/vol6/iss1/26
- 20.↑
Gruber S, van der Laan MJ. An application of collaborative targeted maximum likelihood estimation in causal inference and genomics. Int J Biostat 2010;60.
- 21.↑
Gruber S, van der Laan MJ. A targeted maximum likelihood estimator of a causal effect on a bounded continuous outcome. Technical Report 265, UC Berkeley, CA, 2010.
- 22.↑
Rosenblum M, van der Laan MJ. Targeted maximum likelihood estimation of the parameter of a marginal structural model. Int J Biostat 2010;60.
- 23.↑
Sekhon JS, Gruber S, Porter K, van der Laan MJ. Propensity-score-based estimators and C-TMLE. In: MJ van der Laan and S Rose, editors. Targeted learning: prediction and causal inference for observational and experimental data, chapter 21. New York: Springer, 2011.
- 24.↑
Gruber S, van der Laan MJ. Targeted minimum loss based estimation of a causal effect on an outcome with known conditional bounds. Int J Biostat 2012;8.
- 25.↑
Zheng W, van der Laan MJ. Asymptotic theory for cross-validated targeted maximum likelihood estimation. Technical Report 273, Division of Biostatistics, University of California, Berkeley, CA, 2010.
- 26.↑
Zheng W, van der Laan MJ. Cross-validated targeted minimum loss based estimation. In: MJ van der Laan and S Rose, editors. Targeted learning: causal inference for observational and experimental data, chapter 21. New York: Springer, 2011:459–74.
- 28.↑
Rotnitzky A, Lei Q, Sued M, Robins J. Improved double-robust estimation in missing data and causal inference models. Biometrika 2012;99:439–56.
- 29.↑
Gruber S, van der Laan MJ. Targeted minimum loss based estimator that outperforms a given estimator. Int J Biostat 2012;80:Article 11.
- 30.
Gruber S, van der Laan MJ. Marginal structural models. In: MJ van der Laan and S Rose, editors. C-TMLE of an additive point treatment effect, chapter 19. New York: Springer, 2011.
- 31.
Porter KE, Gruber S, van der Laan MJ, Sekhon JS. The relative performance of targeted maximum likelihood estimators. Int J Biostat 2011;70:1–34.
- 32.
Stitelman OM, van der Laan MJ. Collaborative targeted maximum likelihood for time to event data. Int J Biostat 2010:Article 21.
- 33.↑
van der Laan MJ, Gruber S. Collaborative double robust penalized targeted maximum likelihood estimation. Int J Biostat 2010;60.
- 34.
van der Laan MJ, Rose S. Targeted learning: prediction and causal inference for observational and experimental data. New York: Springer, 2011.
- 35.↑
Wang H, Rose S, van der Laan MJ. Finding quantitative trait loci genes. In: MJ van der Laan and S Rose, editors. Targeted learning: causal inference for observational and experimental data, chapter 23. New York: Springer, 2011.
- 36.↑
Hernan MA, Brumback B, Robins JM. Marginal structural models to estimate the causal effect of zidovudine on the survival of HIV-positive men. Epidemiology 2000;110:561–70.
- 37.↑
Györfi L, Kohler M, Krzyżak A, Walk H. A distribution-free theory of nonparametric regression. New York: Springer-Verlag, 2002.
- 38.↑
van der Laan MJ, Dudoit S, van der Vaart AW. The cross-validated adaptive epsilon-net estimator. Stat Decis 2006;240:373–95.
- 39.↑
van der Laan MJ, Dudoit S, Keles S. Asymptotic optimality of likelihood-based cross-validation. Stat Appl Genet Mol Biol 2004;3:Article 4.
- 40.↑
Dudoit S, van der Laan MJ. Asymptotics of cross-validated risk estimation in estimator selection and performance assessment. Stat Methodol 2005;20:131–54.
- 41.↑
Polley EC, Rose S, van der Laan MJ. Super learning. In: MJ van der Laan and S Rose, editors. Targeted learning: causal inference for observational and experimental data, chapter 3. New York: Springer, 2011.
- 42.↑
Polley EC, van der Laan MJ. Super learner in prediction. Technical report 200. Division of Biostatistics, UC Berkeley, Working Paper Series, 2010.
- 43.↑
van der Laan MJ, Petersen ML. Targeted learning. In: Zhang C, Ma Y, editors. Ensemble machine learning. New York: Springer, 2012:117–56. ISBN 978-1-4419-9326-7.
- 44.↑
van der Laan MJ. Efficient and inefficient estimation in semiparametric models. Center for Mathematics and Computer Science, CWI-tract 114. 1996.
- 45.↑
Lee BK, Lessler J, Stuart EA. Improved propensity score weighting using machine learning. Stat Med 2009;29:337–46.
- 46.
Schneeweiss S, Rassen JA, Glynn RJ, Avorn J, Mogun H, Brookhart MA. High-dimensional propensity score adjustment in studies of treatment effects using health care claims data. Epidemiology 2009;20:512–22. DOI: 10.1097/EDE.0b013e3181a663cc.
- 47.
Vansteelandt S, Bekaert M, Claeskens G. On model selection and model misspecification in causal inference. Stat Methods Med Res 2010;21:7–30. DOI:10.1177/0962280210387717.
- 48.↑
Westreich D, Cole SR, Funk MJ, Brookhart MA, Sturmer T. The role of the c-statistic in variable selection for propensity scores. Pharmacoepidemiol Drug Saf 2011;20:317–20.
- 49.↑
van der Laan MJ, Gruber S. Collaborative double robust penalized targeted maximum likelihood estimation. Int J Biostat 2009;6.
