Let us now consider situations where the variable *Z* takes values in $\mathcal{T}$ (if not, it may be made dichotomous using a threshold) and fulfills the following assumption.

(**A.2**) For $t\in \mathcal{T}$,
$\begin{array}{c}Z\mathrm{\u2568}Y(t)|\mathbf{X},\\ 0<Pr(Z=1|\mathbf{X})<1.\end{array}$

Assumption (A.2) prohibits (a) a direct effect from *Z* to $Y(t)$, i.e. an effect not going through *T* and (b) unobserved variables affecting both *Z* and $Y(t)$. On the other hand, (A.2) allows unobserved variables to affect both *Z* and *T* which is typically prohibited by usual instrumental assumptions [4–6]. Note that when assuming (A.2) in the sequel, *Z* and $Y(t)$ may also be independent conditional on a subset of $\mathbf{X}$, and, e.g. *Z* may be randomized as discussed after Proposition 1. We also need the following regularity condition.

(**A.3**) If (A.1) and (A.2) hold, then $T\mathrm{\u2568}Y(t)|Z,\mathbf{X}$, for $t\in \mathcal{T}$ respectively.

Assumption (A.3) is a regularity condition and is violated only in specific situations, of which Example 1 is typical.

Figure 1 Graph illustrating model (1) in Example 1

**Example 1** *Let us assume that the vector* $({Z}^{\ast},{T}^{\ast},Y(0),U,V)$ *has joint normal distribution, where U and V are two unobserved covariates and the set of observed covariates* $\mathbf{X}$ *is empty. Assume now that the following model generates the data*:
${Z}^{\ast}={\mathrm{\psi}}_{0}+{\mathrm{\psi}}_{1}U+{\mathrm{\psi}}_{2}V+{\mathrm{\epsilon}}_{Z},$
${T}^{\ast}={\mathrm{\nu}}_{0}+{\mathrm{\nu}}_{1}V+{\mathrm{\epsilon}}_{T},$(1)
$Y(0)={\mathrm{\xi}}_{1}{Z}^{\ast}+{\mathrm{\xi}}_{2}U+{\mathrm{\epsilon}}_{Y}.$

*where* $U,V,{\mathrm{\epsilon}}_{Z},{\mathrm{\epsilon}}_{T}\phantom{\rule{thickmathspace}{0ex}}\mathrm{a}\mathrm{n}\mathrm{d}\phantom{\rule{thickmathspace}{0ex}}{\mathrm{\epsilon}}_{Y}$ *are jointly normal and independently distributed. Let* $Z=\mathcal{I}({Z}^{\ast}>0)$ *and* $T=\mathcal{I}({T}^{\ast}>0)$*, where* $\mathcal{I}(\cdot )$ *is the indicator function. Figure 1* *gives a graphical representation of the model, where* ${\mathrm{\epsilon}}_{Z},{\mathrm{\epsilon}}_{T}\phantom{\rule{thickmathspace}{0ex}}\mathrm{a}\mathrm{n}\mathrm{d}\phantom{\rule{thickmathspace}{0ex}}{\mathrm{\epsilon}}_{Y}$ *are omitted. We can write the conditional expectations*
$E(Y(0)|{Z}^{\ast},U)={\mathrm{\xi}}_{1}{Z}^{\ast}+{\mathrm{\xi}}_{2}U,$
$E(U|{Z}^{\ast})=\mathrm{\gamma}{Z}^{\ast},$*where* $\mathrm{\gamma}$ *is function of the parameters in (1)*.

In Example 1, (A.1) and (A.2) will typically be violated, unless we assume that ${\mathrm{\xi}}_{1}=-{\mathrm{\xi}}_{2}\mathrm{\gamma}$, in which case ${Z}^{\ast}\mathrm{\u2568}Y(0)$ by joint normality, and thereby $Z\mathrm{\u2568}Y(0)$ and $T\mathrm{\u2568}Y(0)$. The constrained parametrization ${\mathrm{\xi}}_{1}=-{\mathrm{\xi}}_{2}\mathrm{\gamma}$ yields thus an example where (A.3) is violated since (A.1) and (A.2) hold while one can check that $T\mathrm{\u2568}Y(0)|Z$ does not necessary hold.

This type of example is called unstable [3, Sec. 2.4] in the sense that (A.1 and A.2) will cease to hold as soon as the parameter values do not fulfill the constraint ${\mathrm{\xi}}_{1}=-{\mathrm{\xi}}_{2}\mathrm{\gamma}$. Using directed acyclic graphs,^{1} it can be shown that assumption (A.3) holds as soon as the distribution is stable, where, e.g. a distribution $P(\mathrm{\psi})$ parametrized with a parameter vector $\mathrm{\psi}$ is said stable if no independence can be destroyed by varying the parameter $\mathrm{\psi}$; see Pearl [3, Sec. 2.4] for a formal general definition. Note here that (A.3) does not imply any parametrized functional form.

**Proposition 1** *Assume (A.1)–(A.3), then*
$Z\mathrm{\u2568}Y(t)|T,\mathbf{X},\phantom{\rule{thickmathspace}{0ex}}\phantom{\rule{thickmathspace}{0ex}}\phantom{\rule{thickmathspace}{0ex}}\phantom{\rule{thickmathspace}{0ex}}t\in \mathcal{T}.$(2)

**Proof**. By assumption (A.1) and (A.2) hold. Then, for $t\in \mathcal{T}$,
$\phantom{\rule{1pt}{0ex}}(A.1)\phantom{\rule{thickmathspace}{0ex}}\mathrm{a}\mathrm{n}\mathrm{d}\phantom{\rule{thickmathspace}{0ex}}(A.2)\phantom{\rule{1pt}{0ex}}\Rightarrow T\mathrm{\u2568}Y(t)|Z,\mathbf{X}\phantom{\rule{thickmathspace}{0ex}}\phantom{\rule{1pt}{0ex}}\mathrm{a}\mathrm{n}\mathrm{d}\phantom{\rule{1pt}{0ex}}\phantom{\rule{thickmathspace}{0ex}}Z\mathrm{\u2568}Y(t)|\mathbf{X}$
$\Rightarrow (T,Z)\mathrm{\u2568}Y(t)|\mathbf{X}$
$\Rightarrow Z\mathrm{\u2568}Y(t)|T,\mathbf{X}.$

The first implication by assumption (A.3), the two other by the properties of conditional independence relations, see Dawid [21], Lauritzen [22, Sec. 3.1] and Pearl [3, Sec. 1.1.5]. ■

The conditional independence statement obtained in Proposition 1 is testable from the data when conditioning on $T=t$ (see next section). Finding evidence in the data against (2) yields evidence against the assumptions of the proposition. Thus, evidence against (2) can be interpreted as evidence against the unconfoundedness assumption (A.1) if (A.2) is known to hold from subject-matter considerations – (A.3) being a regularity condition. One application is a random experiment (where *Z* is a random assignment to a treatment) with restricted compliance *T* [4, 12]. Another example of application is treated in detail in Section 4. Note that while identification of the causal effect of *T* on *Y* may follow from (A.2) with linear models, see, e.g. Pearl [3, p. 248], this is not true in general, and stronger assumptions are needed to obtain nonparametric identification of a causal effect such as, e.g. a local average treatment effect [4–6]. In particular, our result does not rely on two assumptions typically made to obtain such identification; that the instrument must affect the treatment in a monotone fashion and that no unobserved heterogeneity is allowed to affect both the instrument and the treatment.

For a test based on (2) to have power against (A.1) we further need to have that *Z* and *T* are dependent conditional on $\mathbf{X}$. This is typically assumed for instrumental variables to be useful for identification. Examples of situations (expressed with directed acyclic graphs; see Footnote 1) for a test that would be based on (2) to have power against (A.1) are given in Figure 2, panels (a)–(c), while panel (d) shows a case where such a test would not have power. A caveat here is that (2) can be tested only when conditioning on $T=t$. This has no practical consequence if the test rejects this null hypothesis. On the other hand, in cases where (2) is not rejected for $T=t$, we have no information on whether it is violated for $T=1-t$. In independent and related work, Guo et al. [14, eqs (3) and (4)] give an example where (2) holds for $T=t$ although not for $T=1-t$, and yet a specific causal effect is identified without the help of *Z* when the earlier mentioned monotonicity assumption holds.

Figure 2 Four directed acyclic graphs together with a respective stable joint distribution for the variables included: Only cases (a)–(c) are such that a test based on (2) may have power, i.e. if (A.1) does not hold, e.g. through the introduction of a variable *V* with arrows pointing toward *T* and $Y(t)$, then $Y(t)\mathrm{\u2568}Z|T,\mathbf{X}$ would not hold either

## Comments (0)