We now turn to the estimation of ${\mathrm{\Psi}}^{1}({P}_{0})$, ${\mathrm{\Psi}}^{2}({P}_{0})$ and ${\mathrm{\Psi}}^{3}({P}_{0})$ as defined in eqs (6), (8) and (12). We take the route of TMLE, a paradigm of inference based on semiparametrics and estimating functions (see Chapter 25, for recent and comprehensive introductions [71, 72]). Introduced by van der Laan and Rubin [4], TMLE has been studied and applied in a variety of contexts since then (we refer to for an overview [3]). An accessible introduction to TMLE is given in (Sections 12, 13 and 14 [58]).

It is apparent in eqs (6), (8) and (12) that the parameters ${\mathrm{\Psi}}^{1}({P}_{0})$, ${\mathrm{\Psi}}^{2}({P}_{0})$ and ${\mathrm{\Psi}}^{3}({P}_{0})$ all depend on $Q({P}_{0})$. Let us assume that we have already built an estimator of $Q({P}_{0})$, which we denote by ${Q}_{n}^{\mathrm{i}\mathrm{n}\mathrm{i}\mathrm{t}}$– that could be, for instance, the super-learner ${\stackrel{\u02c6}{Q}}_{{\mathrm{\alpha}}_{n}}({P}_{n})$ whose construction we described in Section 5.1. Here, the superscript “$\mathrm{i}\mathrm{n}\mathrm{i}\mathrm{t}$” indicates that we think of ${Q}_{n}^{\mathrm{i}\mathrm{n}\mathrm{i}\mathrm{t}}$ as an initial estimator of $Q({P}_{0})$ built for the sake of predicting, not explaining.

Taking a closer look at eqs (6), (8) and (12), we see that it is easy to estimate ${\mathrm{\Psi}}^{1}({P}_{0})$, ${\mathrm{\Psi}}^{2}({P}_{0})$ and ${\mathrm{\Psi}}^{3}({P}_{0})$ by relying on ${Q}_{n}^{\mathrm{i}\mathrm{n}\mathrm{i}\mathrm{t}}$. Consider eq. (6): if we substitute ${Q}_{n}^{\mathrm{i}\mathrm{n}\mathrm{i}\mathrm{t}}$ for $Q(P)$ in the formula, then only the marginal law of ${W}^{-1}$ is left unspecified. The simplest way to estimate the latter, which can be shown to be the most efficient too, is to use its empirical counterpart. That means estimating the marginal law of ${W}^{-1}$ by the empirical law under which ${W}^{-1}={W}_{i}^{-1}$, the *i*th observed value of ${W}^{-1}$ in the data set, with probability $1/n$. Substituting the empirical marginal law of ${W}^{-1}$ for its counterpart under *P* in eq. (6) yields an initial estimator of ${\mathrm{\Psi}}^{1}({P}_{0})$, say ${\mathrm{\psi}}_{n}^{1,\mathrm{i}\mathrm{n}\mathrm{i}\mathrm{t}}$, writing as
$\begin{array}{rl}{\mathrm{\psi}}_{n}^{1,\mathrm{i}\mathrm{n}\mathrm{i}\mathrm{t}}& ={E}_{{P}_{n}}\left\{{Q}_{n}^{\mathrm{i}\mathrm{n}\mathrm{i}\mathrm{t}}(1,{W}^{-1})-{Q}_{n}^{\mathrm{i}\mathrm{n}\mathrm{i}\mathrm{t}}(0,{W}^{-1})\right\}\\ & =\frac{1}{n}\sum _{i=1}^{n}\left[{Q}_{n}^{\mathrm{i}\mathrm{n}\mathrm{i}\mathrm{t}}(1,{W}_{i}^{-1})-{Q}_{n}^{\mathrm{i}\mathrm{n}\mathrm{i}\mathrm{t}}(0,{W}_{i}^{-1})\right].\end{array}$

[Correction added after online publication 11 December 2015: “Substituting the empirical marginal law of..” should read “Substituting the empirical marginal law of ${W}^{-1}$”]

Likewise, the parameter ${\mathrm{\Psi}}^{2}({P}_{0})$ can be simply estimated by ${\mathrm{\psi}}_{n}^{2,\mathrm{i}\mathrm{n}\mathrm{i}\mathrm{t}}=({\mathrm{\psi}}_{k,n}^{2,\mathrm{i}\mathrm{n}\mathrm{i}\mathrm{t}}:1\le k\le K)$ with
$\begin{array}{rl}{\mathrm{\psi}}_{k,n}^{2,\mathrm{i}\mathrm{n}\mathrm{i}\mathrm{t}}& ={E}_{{P}_{n}}\left\{{Q}_{n}^{\mathrm{i}\mathrm{n}\mathrm{i}\mathrm{t}}({W}^{1},k,{W}^{3},\dots ,{W}^{d})-{Q}_{n}^{\mathrm{i}\mathrm{n}\mathrm{i}\mathrm{t}}({W}^{1},0,{W}^{3},\dots ,{W}^{d})\right\}\\ & =\frac{1}{n}\sum _{i=1}^{n}\left[{Q}_{n}^{\mathrm{i}\mathrm{n}\mathrm{i}\mathrm{t}}({W}_{i}^{1},k,{W}_{i}^{3},\dots ,{W}_{i}^{d})-{Q}_{n}^{\mathrm{i}\mathrm{n}\mathrm{i}\mathrm{t}}({W}_{i}^{1},0,{W}_{i}^{3},\dots ,{W}_{i}^{d})\right]\end{array}$while the parameter ${\mathrm{\Psi}}^{3}({P}_{0})$ can be estimated by
${\mathrm{\psi}}_{n}^{3,\mathrm{i}\mathrm{n}\mathrm{i}\mathrm{t}}=\underset{\mathrm{\theta}\in \mathrm{\Theta}}{argmax}\sum _{w\in {{W}}^{3}}h(w)\mathrm{\Lambda}\left({E}_{{P}_{n}}\left\{{Q}_{n}^{\mathrm{i}\mathrm{n}\mathrm{i}\mathrm{t}}({W}^{1},{W}^{2},w,{W}^{4},\dots ,{W}^{d})\right\},{f}_{\mathrm{\theta}}(w)\right)$(15)
$\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}=\underset{\mathrm{\theta}\in \mathrm{\Theta}}{argmax}\sum _{w\in {{W}}^{3}}h(w)\mathrm{\Lambda}\left(\frac{1}{n}\sum _{i=1}^{n}{Q}_{n}^{\mathrm{i}\mathrm{n}\mathrm{i}\mathrm{t}}({W}_{i}^{1},{W}_{i}^{2},w,{W}_{i}^{4},\dots ,{W}_{i}^{d}),{f}_{\mathrm{\theta}}(w)\right).$(16)Interestingly, the optimization problem eq. (15) can be solved easily, see Section A.3.3.

Arguably, ${\mathrm{\psi}}_{n}^{1,\mathrm{i}\mathrm{n}\mathrm{i}\mathrm{t}}$, ${\mathrm{\psi}}_{n}^{2,\mathrm{i}\mathrm{n}\mathrm{i}\mathrm{t}}$ and ${\mathrm{\psi}}_{n}^{3,\mathrm{i}\mathrm{n}\mathrm{i}\mathrm{t}}$ are not targeted toward ${\mathrm{\Psi}}^{1}({P}_{0})$, ${\mathrm{\Psi}}^{2}({P}_{0})$ and ${\mathrm{\Psi}}^{3}({P}_{0})$ in the sense that, although they are obtained by substitution, the key estimator ${Q}_{n}^{\mathrm{i}\mathrm{n}\mathrm{i}\mathrm{t}}$ which plays a crucial role in their definitions was built for the sake of prediction and not specifically tailored for estimating either ${\mathrm{\Psi}}^{1}({P}_{0})$, ${\mathrm{\Psi}}^{2}({P}_{0})$ or ${\mathrm{\Psi}}^{3}({P}_{0})$. In this respect, the targeting step of TMLE can be presented as a general statistical methodology to derive new substitution estimators from such initial estimators so that the updated ones really target what they aim at.

Targeting is made possible because ${\mathrm{\Psi}}^{1}$, ${\mathrm{\Psi}}^{2}$ and ${\mathrm{\Psi}}^{3}$, seen as functions mapping ${M}$ to $[-1,1]$, $[-1,1{]}^{K}$ and $\mathrm{\Theta}$, respectively, are differentiable, see Section A.3.1. In these three cases, the resulting gradients (derivatives), denoted by $\mathrm{\nabla}{\mathrm{\Psi}}^{1}$, $\mathrm{\nabla}{\mathrm{\Psi}}^{2}$ and $\mathrm{\nabla}{\mathrm{\Psi}}^{3}$, drive our choices of estimating functions. Targeting the parameter of interest consists in *(a)* designing a collection $\{{Q}_{n,\mathrm{\epsilon}}^{\mathrm{i}\mathrm{n}\mathrm{i}\mathrm{t}}:\mathrm{\epsilon}\in {E}\}$ of functions mapping ${{W}}^{3}$ to $[0,1]$ conceived as fluctuations of ${|}_{{Q}_{n}^{\mathrm{i}\mathrm{n}\mathrm{i}\mathrm{t}}={Q}_{n,\mathrm{\epsilon}}^{\mathrm{i}\mathrm{n}\mathrm{i}\mathrm{t}}}$ in the direction of the parameter of interest, and *(b)* identifying that specific element of the collection which better targets the parameter of interest, see Section A.3.2. Let us denote by ${Q}_{n}^{1,\mathrm{t}\mathrm{a}\mathrm{r}\mathrm{g}}={Q}_{n,{\mathrm{\epsilon}}_{n}^{1}}^{\mathrm{i}\mathrm{n}\mathrm{i}\mathrm{t}}$, ${Q}_{n}^{2,\mathrm{t}\mathrm{a}\mathrm{r}\mathrm{g}}={Q}_{n,{\mathrm{\epsilon}}_{n}^{2}}^{\mathrm{i}\mathrm{n}\mathrm{i}\mathrm{t}}$ and ${Q}_{n}^{3,\mathrm{t}\mathrm{a}\mathrm{r}\mathrm{g}}={Q}_{n,{\mathrm{\epsilon}}_{n}^{3}}^{\mathrm{i}\mathrm{n}\mathrm{i}\mathrm{t}}$ the three a priori different fluctuations of ${Q}_{n}^{\mathrm{i}\mathrm{n}\mathrm{i}\mathrm{t}}$ that respectively target ${\mathrm{\Psi}}^{1}({P}_{0})$, ${\mathrm{\Psi}}^{2}({P}_{0})$, and ${\mathrm{\Psi}}^{3}({P}_{0})$. They finally yield, by substitution, the three estimators
${\mathrm{\psi}}_{n}^{1,\mathrm{t}\mathrm{a}\mathrm{r}\mathrm{g}}=\frac{1}{n}\sum _{i=1}^{n}\left[{Q}_{n}^{1,\mathrm{t}\mathrm{a}\mathrm{r}\mathrm{g}}(1,{W}_{i}^{-1})-{Q}_{n}^{1,\mathrm{t}\mathrm{a}\mathrm{r}\mathrm{g}}(0,{W}_{i}^{-1})\right],$(17)
$\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}{\mathrm{\psi}}_{n}^{2,\mathrm{t}\mathrm{a}\mathrm{r}\mathrm{g}}=({\mathrm{\psi}}_{k,n}^{2,\mathrm{t}\mathrm{a}\mathrm{r}\mathrm{g}}:1\le k\le K)\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\mathrm{w}\mathrm{h}\mathrm{e}\mathrm{r}\mathrm{e},\text{\hspace{0.17em}}\mathrm{f}\mathrm{o}\mathrm{r}\text{\hspace{0.17em}}\mathrm{e}\mathrm{a}\mathrm{c}\mathrm{h}\text{\hspace{0.17em}}1\le k\le K,$(18)
$\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}{\mathrm{\psi}}_{k,n}^{2,\mathrm{t}\mathrm{a}\mathrm{r}\mathrm{g}}=\frac{1}{n}\sum _{i=1}^{n}\left[{Q}_{n}^{2,\mathrm{t}\mathrm{a}\mathrm{r}\mathrm{g}}({W}_{i}^{1},k,{W}_{i}^{3},\dots ,{W}_{i}^{d})-{Q}_{n}^{2,\mathrm{t}\mathrm{a}\mathrm{r}\mathrm{g}}({W}_{i}^{1},0,{W}_{i}^{3},\dots ,{W}_{i}^{d})\right],$
$\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\begin{array}{rl}{\mathrm{\psi}}_{n}^{3,\mathrm{t}\mathrm{a}\mathrm{r}\mathrm{g}}& =\underset{\mathrm{\theta}\in \mathrm{\Theta}}{argmax}\sum _{w\in {{W}}^{3}}h(w)\mathrm{\Lambda}\left(\frac{1}{n}\sum _{i=1}^{n}{Q}_{n}^{3,\mathrm{t}\mathrm{a}\mathrm{r}\mathrm{g}}({W}_{i}^{1},{W}_{i}^{2},w,{W}_{i}^{4},\dots ,{W}_{i}^{d}),{f}_{\mathrm{\theta}}(w)\right)\\ & =\underset{\mathrm{\theta}\in \mathrm{\Theta}}{argmax}\sum _{w\in {{W}}^{3}}\sum _{i=1}^{n}h(w)\mathrm{\Lambda}\left({Q}_{n}^{3,\mathrm{t}\mathrm{a}\mathrm{r}\mathrm{g}}({W}_{i}^{1},{W}_{i}^{2},w,{W}_{i}^{4},\dots ,{W}_{i}^{d}),{f}_{\mathrm{\theta}}(w)\right).\end{array}$(19)The optimization problem (19) can be solved easily just like eq. (15), see Section A.3.3.

The above estimators satisfy ${\mathrm{\psi}}_{n}^{1,\mathrm{t}\mathrm{a}\mathrm{r}\mathrm{g}}={\mathrm{\Psi}}^{1}({P}_{n}^{1,\mathrm{t}\mathrm{a}\mathrm{r}\mathrm{g}})$, ${\mathrm{\psi}}_{k,n}^{2,\mathrm{t}\mathrm{a}\mathrm{r}\mathrm{g}}={\mathrm{\Psi}}_{k}^{2}({P}_{n}^{2,\mathrm{t}\mathrm{a}\mathrm{r}\mathrm{g}})$, ${\mathrm{\psi}}_{n}^{3,\mathrm{t}\mathrm{a}\mathrm{r}\mathrm{g}}={\mathrm{\Psi}}^{3}({P}_{n}^{3,\mathrm{t}\mathrm{a}\mathrm{r}\mathrm{g}})$ for three empirical laws ${P}_{n}^{1,\mathrm{t}\mathrm{a}\mathrm{r}\mathrm{g}},{P}_{n}^{2,\mathrm{t}\mathrm{a}\mathrm{r}\mathrm{g}},{P}_{n}^{3,\mathrm{t}\mathrm{a}\mathrm{r}\mathrm{g}}\in {M}$. They are targeted in the sense that they satisfy ${E}_{{P}_{n}}\{\mathrm{\nabla}{\mathrm{\Psi}}^{1}({P}_{n}^{1,\mathrm{t}\mathrm{a}\mathrm{r}\mathrm{g}})(O)\}=0$, ${E}_{{P}_{n}}\{\mathrm{\nabla}{\mathrm{\Psi}}^{2}({P}_{n}^{2,\mathrm{t}\mathrm{a}\mathrm{r}\mathrm{g}})(O)\}=0,\text{\hspace{0.17em}}{E}_{{P}_{n}}\{\mathrm{\nabla}{\mathrm{\Psi}}^{3}({P}_{n}^{3,\mathrm{t}\mathrm{a}\mathrm{r}\mathrm{g}})(O)\}=0$, three equalities which are the core of the theoretical study of their asymptotic properties. The two main properties concern the consistency of the estimators and the construction of asymptotic confidence intervals. An estimator is consistent if it converges to the truth when the sample size goes to infinity. The targeted estimators defined in eqs (17), (18) and (19) are double-robust: the stronger requirement for them to be consistent is that *either* the corresponding targeted estimator of $Q({P}_{0})$, say ${Q}_{n}^{\mathrm{t}\mathrm{a}\mathrm{r}\mathrm{g}}$, converge to $Q({P}_{0})$ *or* the conditional law of the variable whose importance is sought given the other components of *W*, say $g({P}_{0})$, be consistently estimated by, say, ${g}_{n}$. Furthermore, the stronger requirement to make it possible to build asymptotically conservative confidence intervals is that the product of the rates of convergence of ${Q}_{n}^{\mathrm{t}\mathrm{a}\mathrm{r}\mathrm{g}}$ to $Q({P}_{0})$ and of ${g}_{n}$ to $g({P}_{0})$ be faster than $1/\sqrt{n}$. Finally, we wish to acknowledge that it is possible to target all parameters with a single, specifically designed, richer collection of fluctuations. Targeting all parameters at once enables the construction of simultaneous confidence regions that better take the mutual dependency of the estimators into account. In a problem with higher stakes, we would have gone that bumpier route.

## Comments (0)