As the authors point out, there is absolutely no reason to use the parametric extended *g*-computation formula method to estimate the desired mean outcome under the dynamic intervention depending on the natural value of treatment, especially since researchers also have access to more robust methods in the semi-parametric model literature. In particular, the authors present an inverse probability of treatment weighted type estimator of the desired extended *g*-computation formula estimand. In this section, we will discuss this point in more detail.

The identification results in causal inference aim to rely on minimal assumptions, in particular, these results typically avoid any statistical assumptions (i.e. restrictions on the probability distribution of the data). That is, many of the identifiability results correspond with nonparametric statistical models. All that hard work for the purpose of reliable inference about causal quantities in this part of the causal inference literature goes to waste if one uses estimators that are biased due to relying on parametric assumptions that are known to be false. It is not scientifically sensible to be nonparametric for the sake of identification but parametric for the sake of estimation given that parametric assumptions are made out of convenience. That is exactly what we do when we use, for example, parametric model-based estimators to estimate the estimand defined by the extended *g*-computation formula, or IPTW estimators based on parametric models for the treatment mechanism. Using such a parametric model-based approach for causal inference makes it less relevant to worry about the causal assumptions since one cannot even trust the estimator of the statistical estimand. This makes one wonder whether there is any theoretical scientific argument to use estimation procedures based on arbitrary parametric assumptions.

One argument might be that estimators based on parametric models can be shown to be asymptotically normally distributed. In other words, we have theorems that show that the confidence intervals have the desired coverage asymptotically under the assumption that these parametric assumptions are true. But what is the point of relying on a theorem whose assumptions are known to be false?

In addition, by not enforcing that a statistical model needs to be correctly specified (i.e. contain the true distribution), different statisticians often end up generating different statistically significant output, even when they are addressing identical statistical estimation problems and have equal access to all the statistical information about the data generating experiment. The problem here is that the choice of statistical model is viewed as an art instead of a choice driven by scientific knowledge, missing the fact this choice heavily affects the choice of target estimand, the corresponding estimator, and its statistical properties. Some data analysts like to quote “all models are wrong, but some are useful” and use it as an argument that we should not worry too much about the model choice. The truth is that as long as the field of applied statistics is driven by arbitrary model choices, we do not satisfy common sense scientific standards.

Important advances have been made in empirical process theory, weak convergence theory (e.g. van der Vaart and Wellner, 1996), efficiency theory for semi-parametric models (e.g. Bickel et al., 1997), general methods for construction of efficient estimators (e.g. Robins and Rotnitzky, 1992; van der Laan and Robins, 2003; van der Laan and Rose, 2012; Hernan and Robins, 2014), providing us with theorems establishing asymptotic consistency, normality, and efficiency of highly data adaptive estimators in large statistical models. Let us use a concrete demonstration of such a type of theorem concerning the estimation of a pathwise differentiable target parameter $\mathrm{\Psi}:\mathcal{M}\to \mathbb{R}$ with canonical gradient/efficient influence curve $(P,O)\to {D}^{\ast}(P)(O)$ at *P*. Given this $\mathrm{\Psi}$ and ${D}^{\ast}(P)$ one obtains, by definition of pathwise differentiability, that
$\mathrm{\Psi}(P)-\mathrm{\Psi}({P}_{0})=-{P}_{0}{D}^{\ast}(P)+{R}_{2}(P,{P}_{0}),$where ${R}_{2}(P,{P}_{0})$ is a second-order difference between *P* and ${P}_{0}$ that can be explicitly determined for each choice of target parameter $\mathrm{\Psi}$ and model $\mathcal{M}$ [see for example, van der Laan, 2012 (2014), for a detailed demonstration]. It is assumed that we select the statistical model $\mathcal{M}$ so that one feels confident that ${P}_{0}\in \mathcal{M}$.

Consider a substitution estimator $\mathrm{\Psi}({P}_{n}^{\ast})$, such as a TMLE based on an initial super-learner-based estimator ${P}_{n}^{0}$ (van der Laan and Dudoit, 2003; van der Vaart et al., 2006; van der Laan et al., 2006, 2007; Polley et al., 2012) that is then updated into a targeted estimator ${P}_{n}^{\ast}$. The estimator $\mathrm{\Psi}({P}_{n}^{\ast})$ might also be a parametric *g*-computation estimator relying on a parametric model-based MLE ${P}_{n}^{\ast}$ of ${P}_{0}$. As shown in van der Laan and Rubin (2006) (or many other subsequent articles, including (van der Laan and Rose, 2012)) $\mathrm{\Psi}({P}_{n}^{\ast})$ asymptotically normally distributed and efficient for $\mathrm{\Psi}({P}_{0})$ if

(i)

${P}_{n}{D}^{\ast}({P}_{n}^{\ast})={o}_{P}(1/\sqrt{n})$,

(ii)

${D}^{\ast}({P}_{n}^{\ast})$ falls in a ${P}_{0}$-Donsker class with probability tending to 1,

(iii)

${P}_{0}({D}^{\ast}({P}_{n}^{\ast})-{D}^{\ast}({P}_{0}){)}^{2}\to 0$ in probability, and

(iv)

${R}_{2}({P}_{n}^{\ast},{P}_{0})={o}_{P}(1/\sqrt{n})$.

If one uses the TMLE, then condition (i) is automatically satisfied. Condition (ii) would be satisfied, for example, if

${D}^{\ast}({P}_{n}^{\ast})$ falls in the class of multivariate real-valued functions with uniform sectional variation norm bounded by some

$M<\mathrm{\infty}$ (

Gill et al., 1995), a much less stringent assumption from requiring that

${P}_{n}^{\ast}$ is estimated in a parametric model. In addition, if one uses a CV-TMLE (

Zheng and van der Laan, 2011;

Rose and van der Laan, 2011), then condition (ii) can be removed. Let us consider such a CV-TMLE so that the only conditions for asymptotic efficiency are the weak asymptotic consistency condition (iii) and condition (iv). Clearly, condition (iv) is the condition to worry about (if (iv) holds one certainly expects (iii) to hold).

If ${P}_{n}^{\ast}$ is based on a misspecified parametric model, then there is no hope that ${R}_{2}({P}_{n}^{\ast},{P}_{0})$ will converge to zero, i.e. (iv) will not hold. To make this crucial condition as realistic as possible we have promoted the use of super-learning, a cross-validated ensemble learner which incorporates the state of the art of machine learning algorithms and possibly a large variety of parametric model-based estimators. The oracle inequality for the super-learner (see above references) teaches us that we make this condition more and more likely to hold by selecting a library of diverse estimators that grow in size polynomial in sample size. That is, there is *no* trade off such as that we cannot be *too* data adaptive, but, on the contrary, we have to push the envelope as much as possible to be maximally data adaptive in order to ensure that ${R}_{2}({P}_{n}^{\ast},{P}_{0})={o}_{P}(1/\sqrt{n})$. In addition, under this condition (iv), the estimator is asymptotically efficient and thus also asymptotically regular, a nice by-product for reliable confidence intervals.

In order to move our field forward, we need to fully acknowledge these issues and start defining the estimation problem truthfully. In our work, we defined the field Targeted Learning as the subfield of statistics that is concerned with theory, estimation, and statistical inference (i.e. confidence intervals) for target parameters (representing the answer to the actual scientific question of interest) in realistic statistical models (i.e. incorporating actual knowledge). By necessity, Targeted Learning requires integrating the state of the art in data adaptive estimation, beyond incorporation of subject-matter driven estimators and requires targeting the estimation procedure toward the target parameter of interest. Given these estimators, Targeted Learning requires targeting the estimation procedure toward the target parameter of interest. Targeted minimum loss-based estimation (and its variants such as CV-TMLE, C-TMLE), combined with Super-Learning, provides a general template to construct such targeted substitution estimators (van der Laan and Rubin, 2006; van der Laan and Rose, 2012).

An example of this methodology, relevant to the paper under discussion, is the longitudinal TMLE of summary measures of the mean outcome under dynamic interventions (such as defined by working MSM) in Gruber and van der Laan (2012), Petersen et al. (2013). The TMLE for this problem is inspired by important double robust targeted estimators established in earlier work of Bang and Robins (2005). This TMLE is implemented by the R-package ltmle and fully utilizes the important sequential regression representation presented in Bang and Robins (2005). This TMLE is easily extended to TMLE of summary measures of mean outcomes under stochastic interventions. The extended *g*-computation formula corresponds with an estimated stochastic intervention, so that the statistical inference will now also need to take into account that the stochastic intervention was estimated. On the other hand, if we go after the mean outcome under a data adaptive fit of the desired stochastic intervention, then the statistical inference is identical to treating the fitted stochastic intervention as known. In this manner, by extending the current ltmle R-package to stochastic (and possibly unknown) interventions (instead of only dynamic interventions), this method would now be accessible to many practitioners, thereby allowing data analysts to significantly improve on the parametric extended *g*-computation formula approach and IPTW estimators relying on parametric models for the treatment mechanism.

We again commend the excellent work of the authors. The field needs more important observations such as this one, which allow the straightforward application of previously described identifiability results and robust estimators to new problems. We further advocate the consideration of newly developed data adaptive target parameters, which often similarly allow for the application of existing estimators to interesting new problems.

## Comments (0)

General note:By using the comment function on degruyter.com you agree to our Privacy Statement. A respectful treatment of one another is important to us. Therefore we would like to draw your attention to our House Rules.