Let *W* denote a *d*-dimensional vector of covariates, and let *Y* denote an outcome of interest measured only when a missingness indicator *A* is equal to one. To simplify the exposition, we assume that *Y* is binary or continuous taking values in the interval (0, 1). The observed data $O=(W,A,AY)$ is assumed to have a distribution *P*_{0} in the nonparametric model ${m}$. Assume we observe an i.i.d. sample *O*_{1},…, *O*_{n}, and denote the empirical distribution by *P*_{n}. For every element $P\in {m}$, we define
$\begin{array}{c}{Q}_{W}(P)(w):=P(W\le w)\\ g(P)(w):=P(A=1|W=w)\\ \stackrel{\u02c9}{Q}(P)(w):={E}_{P}(Y|A=1,\phantom{\rule{thinmathspace}{0ex}}W=w),\end{array}$

where *E*_{P} denotes expectation under *P*. We denote ${Q}_{W,0}:={Q}_{W}({P}_{0})$, ${g}_{0}:=g({P}_{0})$, and ${\stackrel{\u02c9}{Q}}_{0}:=\stackrel{\u02c9}{Q}({P}_{0})$. We refer to $\stackrel{\u02c9}{Q}$ as the *outcome regression*, and to *g* as the *missingness score*. We suppress the argument *P* from the notation *Q*_{W}(*P*), *g*(*P*), and $\stackrel{\u02c9}{Q}(P)$ whenever it does not cause confusion. For a function *f* of *o*, we use the notation $Pf:=\int f(o)dP(o)$. Let $\mathrm{\Psi}$: ${m}\to \mathbb{R}$ be a parameter mapping defined as $\mathrm{\Psi}(P):={E}_{P}\{\stackrel{\u02c9}{Q}(W)\}$, and let ${\mathrm{\psi}}_{0}:=\mathrm{\Psi}({P}_{0})$. Under the assumptions that missingness *A* is independent of the outcome *Y* conditional on the covariates *W* and ${P}_{0}({g}_{0}(W)>0)=1$, it can be shown that ${\mathrm{\psi}}_{0}={E}_{{F}_{0}}(Y)$, where *F*_{0} is the true distribution of the full data (*W, Y*). Because $\mathrm{\Psi}$ depends on *P* only through $Q:=({Q}_{W},\stackrel{\u02c9}{Q})$, we also use the alternative notation $\mathrm{\Psi}(Q)$ to refer to $\mathrm{\Psi}(P)$.

First-order inference for ${\mathrm{\psi}}_{0}$ is based on the following expansion of the parameter functional $\mathrm{\Psi}(P)$ around the true *P*_{0}:
$\mathrm{\Psi}(P)-\mathrm{\Psi}({P}_{0})=-{P}_{0}{D}^{(1)}(P)+{R}_{2}(P,\phantom{\rule{thinmathspace}{0ex}}{P}_{0}),$(1)

where *D*^{(1)}(*P*) is a function of an observation $o=(w,\phantom{\rule{thinmathspace}{0ex}}a,\phantom{\rule{thinmathspace}{0ex}}y)$ that depends on *P*, and *R*_{2}(*P, P*_{0}) is a second-order remainder term. The super index (1) is used to denote a first-order approximation. This expansion may be seen as analogous to a Taylor expansion when *P* is indexed by a finite-dimensional quantity, and the expression *second-order* may be interpreted in the same way.

We use the expression *first-order estimator* to refer to estimators based on first-order approximations as in eq. (1). Analogously, the expression *second-order estimator* is used to refer to estimators based on second-order approximations, e. g., as presented in Section 3 below.

Doubly robust locally efficient inference is based on approximation (1) with
${D}^{(1)}(P)(o)=\frac{a}{g(w)}\{y-\stackrel{\u02c9}{Q}(w)\}+\stackrel{\u02c9}{Q}(w)-\mathrm{\Psi}(P),$(2)
${R}_{2}(P,{P}_{0})=\int \left\{1-\frac{{g}_{0}(w)}{g(w)}\right\}\{\stackrel{\u02c9}{Q}(w)-{\stackrel{\u02c9}{Q}}_{0}(w)\}d{Q}_{W,0}(w).$(3)

Straightforward algebra suffices to check that eq. (1) holds with the definitions given above. *D*^{(1)} as defined in eq. (2) is referred to as the canonical gradient or the efficient influence function [8, 3].

First-order targeted minimum loss-based estimation of ${\mathrm{\psi}}_{0}$ is performed in the following steps [2]:

Step 1. *Initial estimators*. Obtain initial estimators $\stackrel{\u02c6}{g}$ and $\stackrel{\u02c6}{\stackrel{\u02c9}{Q}}$ of *g*_{0} and ${\stackrel{\u02c9}{Q}}_{0}$. In general, the functional form of *g*_{0} and ${\stackrel{\u02c9}{Q}}_{0}$ will be unknown to the researcher. Since consistent estimation of these quantities is key to achieve asymptotic efficiency of $\stackrel{\u02c6}{\mathrm{\psi}}$, we advocate for the use of data-adaptive predictive methods that allow flexibility in the specification of these functional forms.

Step 2. *Compute auxiliary covariate*. For each subject *i*, compute the auxiliary covariate
${\stackrel{\u02c6}{H}}^{(1)}({W}_{i}):=\frac{1}{\stackrel{\u02c6}{g}({W}_{i})}.$

Step 3. *Solve estimating equations*. Estimate the parameter ò in the logistic regression model
$\mathrm{l}\mathrm{o}\mathrm{g}\mathrm{i}\mathrm{t}\phantom{\rule{thinmathspace}{0ex}}{\stackrel{\u02c6}{\stackrel{\u02c9}{Q}}}_{\mathrm{\epsilon},h}(w)=\mathrm{l}\mathrm{o}\mathrm{g}\mathrm{i}\mathrm{t}\phantom{\rule{thinmathspace}{0ex}}\stackrel{\u02c6}{\stackrel{\u02c9}{Q}}(w)+\u03f5{\stackrel{\u02c6}{H}}^{(1)}(w),$(4)

by fitting a standard logistic regression model of *Y*_{i} on ${\stackrel{\u02c6}{H}}^{(1)}({W}_{i})$, with no intercept and with offset logit $\stackrel{\u02c6}{\stackrel{\u02c9}{Q}}({W}_{i})$, among observations with $A=1$. Alternatively, fit the model
$\mathrm{l}\mathrm{o}\mathrm{g}\mathrm{i}\mathrm{t}\phantom{\rule{thinmathspace}{0ex}}{{\stackrel{\u02c6}{\stackrel{\u02c9}{Q}}}_{\u03f5}}_{,h}(w)=\mathrm{l}\mathrm{o}\mathrm{g}\mathrm{i}\mathrm{t}\phantom{\rule{thinmathspace}{0ex}}\stackrel{\u02c6}{\stackrel{\u02c9}{Q}}(w)+\u03f5$

with weights ${\stackrel{\u02c6}{H}}^{(1)}({W}_{i})$ among observations with $A=1$. In either case, denote the estimate of $\u03f5\text{\hspace{0.17em}}\mathrm{b}\mathrm{y}\text{\hspace{0.17em}}\stackrel{\u02c6}{\u03f5}$.

Step 4. *Update initial estimator and compute 1-TMLE*. Update the initial estimator as ${\stackrel{\u02c6}{\stackrel{\u02c9}{Q}}}_{h}\ast (w)={\stackrel{\u02c6}{\stackrel{\u02c9}{Q}}}_{\stackrel{\u02c6}{\mathrm{\epsilon}}}(w)$, and define the 1-TMLE as $\stackrel{\u02c6}{\mathrm{\psi}}=\mathrm{\Psi}(\stackrel{\u02c6}{\stackrel{\u02c9}{Q}}\ast )$.

Note that this estimator $\stackrel{\u02c6}{P}$ of *P*_{0} satisfies ${P}_{n}{D}^{(1)}(\stackrel{\u02c6}{P})=0$. For a full presentation of the TMLE algorithm the interested reader is referred to [3] and the references therein. Using eq. (1) along with ${P}_{n}{D}^{(1)}(\stackrel{\u02c6}{P})=0$ we obtain that
$\stackrel{\u02c6}{\mathrm{\psi}}-{\mathrm{\psi}}_{0}=({P}_{n}-{P}_{0}){D}^{(1)}(\stackrel{\u02c6}{P})+{R}_{2}(\stackrel{\u02c6}{P},\phantom{\rule{thinmathspace}{0ex}}{P}_{0}).$

Provided that

(i)

${D}^{(1)}(\stackrel{\u02c6}{P})$ converges to *D*^{(1)}(*P*_{0}) in *L*_{2}(*P*_{0}) norm, and

(ii)

the size of the class of functions considered for estimation of $\stackrel{\u02c6}{P}$ is bounded (technically, there exists a Donsker class ${h}$ so that ${D}^{(1)}(\stackrel{\u02c6}{P})\in {h}$ with probability tending to one),

results from empirical process theory (e. g., theorem 19.24 of Ref. [

9]) allow us to conclude that

$\stackrel{\u02c6}{\mathrm{\psi}}-{\mathrm{\psi}}_{0}=({P}_{n}-{P}_{0}){D}^{(1)}({P}_{0})+{R}_{2}(\stackrel{\u02c6}{P},\phantom{\rule{thinmathspace}{0ex}}{P}_{0}).$In addition, if
${R}_{2}(\stackrel{\u02c6}{P},{P}_{0})={o}_{P}({n}^{-1/2}),$(5)

we obtain that $\stackrel{\u02c6}{\mathrm{\psi}}-{\mathrm{\psi}}_{0}=({P}_{n}-{P}_{0}){D}^{(1)}({P}_{0})+{o}_{P}({n}^{-1/2})$. This implies, in particular, that $\stackrel{\u02c6}{\mathrm{\psi}}$ is a $\sqrt{n}$-consistent estimator of ${\mathrm{\psi}}_{0}$, it is asymptotically normal, and it is locally efficient.

In this paper we discuss ways of constructing an estimator that requires a consistency assumption weaker than eq. (5). Note that eq. (5) is an assumption about the convergence rate of a second-order term involving the product of the differences $\stackrel{\u02c6}{\stackrel{\u02c9}{Q}}-{Q}_{0}$ and $\stackrel{\u02c6}{g}-{g}_{0}$. Using the Cauchy-Schwarz inequality repeatedly, $|{R}_{2}(\stackrel{\u02c6}{P},{P}_{0})|$ may be bounded as
$|{R}_{2}(\widehat{P},\text{\hspace{0.17em}}{P}_{0})|\le \left|\right|1/\widehat{g}\left|{|}_{\infty}\right||\widehat{g}-{g}_{0}|{|}_{P{}_{0}}\left|\right|\widehat{\overline{Q}}-{\overline{Q}}_{0}|{|}_{{P}_{0}},$

where $||f|{|}_{P}^{2}:=\int {f}^{2}(o)\mathrm{d}P(o)$, and $||f|{|}_{\mathrm{\infty}}:=\mathrm{s}\mathrm{u}\mathrm{p}\{f(o):o\in {o}\}$. For assumption (5) to hold, it is sufficient to have that

(i)

$\stackrel{\u02c6}{g}$ is bounded away from zero with probability tending to one;

(ii)

$\stackrel{\u02c6}{g}$ is the MLE of ${g}_{0}\in {g}=\{g(w;\text{\hspace{0.17em}}\mathrm{\beta}):\text{\hspace{0.17em}}\mathrm{\beta}\in {\mathbb{R}}^{d}\}$ (i. e., *g*_{0} is estimated in a correctly specified parametric model) since this implies $||\stackrel{\u02c6}{g}-{g}_{0}|{|}_{{P}_{0}}={O}_{P}({n}^{-1/2})$; and

(iii)

$||\stackrel{\u02c6}{\stackrel{\u02c9}{Q}}-{\stackrel{\u02c9}{Q}}_{0}|{|}_{{P}_{0}}={o}_{P}(1).$

Alternatively the roles of

$\stackrel{\u02c6}{g}$ and

$\stackrel{\u02c6}{\stackrel{\u02c9}{Q}}$ could also be interchanged in ii) and iii). As discussed in Ref. [

1], however, correct specification of a parametric model is hardly achievable in high-dimensional settings. Data-adaptive estimators must then be used for the outcome regression and missingness score, but they may potentially yield a remainder term

*R*_{2} with a convergence rate slower than

${n}^{-1/2}$. In the next section we present a second-order expansion of the parameter functional that allows the construction of estimators that require consistency assumptions weaker than eq. (

5).

## Comments (0)

General note:By using the comment function on degruyter.com you agree to our Privacy Statement. A respectful treatment of one another is important to us. Therefore we would like to draw your attention to our House Rules.