Suppose we observe *n* i.i.d. copies ${O}_{1},\dots ,{O}_{n}\in \mathcal{O}$ of
$O=(L(0),A(0),L(1),A(1),Y)\sim {P}_{0},$where $A(j)=({A}_{1}(j),{A}_{2}(j))$, ${A}_{1}(j)$ is a binary treatment, and ${A}_{2}(j)$ is an indicator of not being right censored at “time” *j*, $j=0,\phantom{\rule{1pt}{0ex}}\phantom{\rule{thinmathspace}{0ex}}1$. That is, ${A}_{2}(0)=0$ implies that $(L(1),{A}_{1}(1),Y)$ is n ot observed, and ${A}_{2}(1)=0$ implies that *Y* is not observed. Each time point *j* has covariates $L(j)$ that precede treatment, $j=0,1$, and the outcome of interest is given by *Y* and occurs after time point 1. For a time-dependent process $X(\cdot )$, we use the notation $\stackrel{\u02c9}{X}(t)=(X(s):s\le t)$, where $\stackrel{\u02c9}{X}(-1)=\mathrm{\varnothing}$. Let $\mathcal{M}$ be a statistical model that makes no assumptions on the marginal distribution ${Q}_{0,L(0)}$ of $L(0)$ and the conditional distribution ${Q}_{0,L(1)}$ of $L(1)$, given $A(0),L(0)$, but might make assumptions on the conditional distributions ${g}_{0A(j)}$ of $A(j)$, given $\stackrel{\u02c9}{A}(j-1),\stackrel{\u02c9}{L}(j)$, $j=0,1$. We will refer to ${g}_{0}$ as the intervention mechanism, which can be factorized in a treatment mechanism ${g}_{01}$ and censoring mechanism ${g}_{02}$ as follows:
${g}_{0}(O)=\prod _{j=1}^{2}{g}_{01}\left({A}_{1}(j)|\stackrel{\u02c9}{A}(j-1),\stackrel{\u02c9}{L}(j)\right){g}_{02}\left({A}_{2}(j)|{A}_{1}(j),\stackrel{\u02c9}{A}(j-1),\stackrel{\u02c9}{L}(j)\right).$In particular, the data might have been generated by a SMART, in which case ${g}_{01}$ is known.

Let $V(1)$ be a function of $(L(0),A(0),L(1))$, and let $V(0)$ be a function of $L(0)$. Let $V=(V(0),V(1))$. Consider dynamic treatment rules $V(0)\to {d}_{A(0)}(V(0))\in \{0,1\}\times \{1\}$ and $(A(0),V(1))\to {d}_{A(1)}(A(0),$ $V(1))\in \{0,1\}\times \{1\}$ for assigning treatment $A(0)$ and $A(1)$, respectively, where the rule for $A(0)$ is only a function of $V(0)$, and the rule for $A(1)$ is only a function of $(A(0),V(1))$. Note that these rules are restricted to set the censoring indicators ${A}_{2}(j)=1$, $j=0,1$. Let $\mathcal{D}$ be the set of all such rules. We assume that $V(0)$ is a function of $V(1)$ (i.e., observing $V(1)$ includes observing $V(0)$), but in the theorem below we indicate an alternative assumption. For $d\in \mathcal{D}$, we let
$d(a(0),v)\equiv \left({d}_{A(0)}(v(0)),{d}_{A(1)}(a(0),v(1))\right).$If we assume a structural equation model [7] for variables stating that
$L(0)={f}_{L(0)}\left({U}_{L(0)}\right)$
$A(0)={f}_{A(0)}\left(L(0),{U}_{A(0)}\right)$
$L(1)={f}_{L(1)}\left(L(0),A(0),{U}_{L(1)}\right)$
$A(1)={f}_{A(1)}\left(\stackrel{\u02c9}{L}(1),A(0),{U}_{A(1)}\right)$
$Y={f}_{Y}\left(\stackrel{\u02c9}{L}(1),\stackrel{\u02c9}{A}(1),{U}_{Y}\right),$where the collection of functions $f=({f}_{L(0)},{f}_{A(0)},{f}_{L(1)},{f}_{A(1)})$ is unspecified or partially specified, we can define counterfactuals ${Y}_{d}$ defined by the modified system in which the equations for $A(0),A(1)$ are replaced by $A(0)={d}_{A(0)}(V(0))$ and $A(1)={d}_{A(1)}(A(0),V(1))$_{,} respectively. Denote the distribution of these counterfactual quantities as ${P}_{0,d}$, where we note that ${P}_{0,d}$ is implied by the collection of functions *f* and the joint distribution of exogeneous variables $({U}_{L(0)},{U}_{A(0)},{U}_{L(1)},{U}_{A(1)},{U}_{Y})$. We can now define the causally optimal rule under ${P}_{0,d}$ as ${d}_{0}^{\ast}=arg\phantom{\rule{thinmathspace}{0ex}}{max\phantom{\rule{thinmathspace}{0ex}}}_{d\in \mathcal{D}}{E}_{{P}_{0,d}}{Y}_{d}$. If we assume a sequential randomization assumption stating that $A(0)$ is independent of ${U}_{L(1)},{U}_{Y}$, given $L(0)$, and $A(1)$ is independent of ${U}_{Y}$, given $\stackrel{\u02c9}{L}(1),A(0)$, then we can identify ${P}_{0,d}$ with observed data under the distribution ${P}_{0}$ using the G-computation formula:
$\begin{array}{rl}& {p}_{0,d}\left(L(0),A(0),L(1),A(1),Y\right)\\ & \phantom{\rule{1em}{0ex}}\equiv I\left(A=d(A(0),V)\right){q}_{0,L(0)}(L(0)){q}_{0,L(1)}\left(L(1)|L(0),A(0)\right){q}_{0,Y}\left(Y|\stackrel{\u02c9}{L}(1),\stackrel{\u02c9}{A}(1)\right),\end{array}$(1)where ${p}_{0,d}$ is the density of ${P}_{0,d}$ and ${q}_{0,L(0)}$, ${q}_{0,L(1)}$, and ${q}_{0,Y}$ are the densities for ${Q}_{0,L(0)}$, ${Q}_{0,L(1)}$, and ${Q}_{0,Y}$, respectively, where ${Q}_{0,Y}$ represents the distribution of *Y* given $\stackrel{\u02c9}{L}(1),\stackrel{\u02c9}{A}(1)$. We assume that all densities above are absolutely continuous with respect to some dominating measure $\mathrm{\mu}$. We have a similar identifiability result/G-computation formula under the Neyman-Rubin causal model [8]. For the right censoring indicators ${A}_{2}(0)$ and ${A}_{2}(1)$, we note the parallel between the coarsening at random assumption and the sequential randomization assumption [49]. Thus here we have encoded our missingness assumptions in our causal assumptions.

More generally, for a distribution $P\in \mathcal{M}$ we can define the G-computation distribution ${P}_{d}$ as the distribution with density
$\begin{array}{rl}& {p}_{d}(L(0),A(0),L(1),A(1),Y)\\ & \phantom{\rule{1em}{0ex}}\equiv I(A=d(A(0),V)){q}_{L(0)}(L(0)){q}_{L(1)}(L(1)|L(0),A(0)){q}_{Y}(Y|\stackrel{\u02c9}{L}(1),\stackrel{\u02c9}{A}(1)),\end{array}$where ${q}_{L(0)}$, ${q}_{L(1)}$, and ${q}_{Y}$ are the counterparts to ${q}_{0,L(0)}$, ${q}_{0,L(1)}$, and ${q}_{0,Y}$, respectively, under *P*.

For the remainder of this article, if for a static or dynamic intervention *d*, we use notation ${L}_{d}$ (or ${Y}_{d}$, ${O}_{d}$) we mean the random variable with the probability distribution ${P}_{d}$ in (1) so that all of our quantities are statistical parameters. For example, the quantity ${E}_{{P}_{0}}({Y}_{a(0)a(1)}|{V}_{a(0)}(1))$ defined in the next theorem denotes the conditional expectation of ${Y}_{a(0)a(1)}$, given ${V}_{a(0)}(1)$, under the probability distribution ${P}_{0,a(0)a(1)}$ (i.e., G-computation formula presented above for the static intervention $(a(0),a(1))$. In addition, if we write down these parameters for some ${P}_{d}$, we will automatically assume the positivity assumption at *P* required for the *G*-computation formula to be well defined. For that it will suffice to assume the following positivity assumption at *P*:
$\mathrm{P}{\mathrm{r}}_{P}\left(0<\underset{{a}_{1}\in \{0,1\}}{min}{g}_{0A(0)}\left({a}_{1},1|L(0)\right)\right)=1$
$\mathrm{P}{\mathrm{r}}_{P}\left(0<\underset{{a}_{1}\in \{0,1\}}{min}{g}_{0A(1)}({a}_{1},1|\stackrel{\u02c9}{L}(1),A(0))\right)=1.$(2)The strong positivity assumption will be defined as the above assumption, but where the 0 is replaced by a $\mathrm{\delta}>0$.

We now define a statistical parameter representing the mean outcome ${Y}_{d}$ under ${P}_{d}$. For any rule $d\in \mathcal{D}$, let
${\mathrm{\Psi}}_{d}(P)\phantom{\rule{thinmathspace}{0ex}}\equiv \phantom{\rule{thinmathspace}{0ex}}{E}_{{P}_{d}}{Y}_{d}.$For a distribution *P*, define the *V*-optimal rule as
${d}_{P}=\mathrm{a}\mathrm{r}\mathrm{g}\underset{d\in \mathcal{D}}{\mathrm{m}\mathrm{a}\mathrm{x}\phantom{\rule{thinmathspace}{0ex}}}{E}_{{P}_{d}}{Y}_{d}.$For simplicity, we will write ${d}_{0}$ instead of ${d}_{{P}_{0}}$ for the *V*-optimal rule under ${P}_{0}$. Define the parameter mapping $\mathrm{\Psi}:\mathcal{M}\to \mathrm{I}\phantom{\rule{negativethinmathspace}{0ex}}\mathrm{R}$ as $\mathrm{\Psi}(P)={E}_{{P}_{{d}_{P}}}{Y}_{{d}_{P}}$. The first part of this article is concerned with inference for the parameter
${\mathrm{\psi}}_{0}\equiv \mathrm{\Psi}({P}_{0})={E}_{{P}_{0,{d}_{0}}}{Y}_{{d}_{0}}.$Under our identifiability assumptions, ${d}_{0}$ is equal to the causally optimal rule ${d}_{0}^{\ast}$. Even if the sequential randomization assumption does not hold, the statistical parameter ${\mathrm{\psi}}_{0}$ represents a statistical parameter of interest in its own right. We will not concern ourselves with the sequential randomization assumption for the remainder of this paper.

The next theorem presents an explicit form of the *V*-optimal individualized treatment rule ${d}_{0}$ as a function of ${P}_{0}$.

**Theorem 1**. *Suppose* $V(0)$ *is a function of* $V(1)$*. The V-optimal rule* ${d}_{0}$ *can be represented as the following explicit parameter of* ${P}_{0}$:

${\overline{Q}}_{20}(a(0),v(1))=$
$\phantom{\rule{1em}{0ex}}{E}_{{P}_{0}}\left({Y}_{a(0),A(1)=(1,1)}|{V}_{a(0)}(1)=v(1)\right)-{E}_{{P}_{0}}\left({Y}_{a(0),A(1)=(0,1)}|{V}_{a(0)}(1)=v(1)\right)$
${d}_{0,A(1)}(A(0),V(1))=\left(I({\overline{Q}}_{20}(A(0),V(1))>0),1\right)$
${\overline{Q}}_{10}(v(0))={E}_{{P}_{0}}\left({Y}_{(1,1),{d}_{0,A(1)}}|V(0)\right)-{E}_{{P}_{0}}\left({Y}_{(0,1),{d}_{0,A(1)}}|V(0)\right)$
${d}_{0,A(0)}(V(0))=\left(I({\overline{Q}}_{10}(V(0))>0),1\right),$*where* $a(0)\in \{0,1\}\times \{1\}$*. If* $V(1)$ *does not include* $V(0)$*, but, for all* $(a(0),a(1))\in \{\{0,1\}\times \{1\}{\}}^{2}$,

${E}_{{P}_{0}}\left({Y}_{a(0),a(1)}|V(0),{V}_{a(0)}(1)\right)={E}_{{P}_{0}}\left({Y}_{a(0),a(1)}|{V}_{a(0)}(1)\right),$(3)*then the above expression for the V-optimal rule* ${d}_{0}$ *is still true*.

## Comments (0)

General note:By using the comment function on degruyter.com you agree to our Privacy Statement. A respectful treatment of one another is important to us. Therefore we would like to draw your attention to our House Rules.