# Monotone Confounding, Monotone Treatment Selection and Monotone Treatment Response

Zhichao Jiang, Yasutaka Chiba and Tyler J. VanderWeele

# Abstract

Manski (Monotone treatment response. Econometrica 1997;65:1311–34) and Manski and Pepper (Monotone instrumental variables: with an application to the returns to schooling. Econometrica 2000;68:997–1010) gave sharp bounds on causal effects under the assumptions of monotone treatment response (MTR) and monotone treatment selection (MTS). VanderWeele (The sign of the bias of unmeasured confounding. Biometrics 2008;64:702–6) provided bounds for binary treatment under an assumption of monotone confounding (MC). We discuss the relation between MC and MTS and provide bounds under various combinations of these assumptions. We show that MC and MTS coincide for a binary treatment, but MC does not imply MTS for a treatment variable with more than two levels.

## 1 Introduction

We consider bounds on causal effects for an ordinal treatment variable under treatment selection. The main purpose of this work is to consider the relationship between the assumptions of monotone treatment selection (MTS) [1] and monotone confounding (MC) [2]. We will show MC and MTS coincide for a binary treatment, but MC does not imply MTS for a treatment variable with more than two levels. We also derive new bounds under different combinations of MTS or MC and an assumption of monotone treatment response (MTR).

We assume a treatment response framework identical to Manski [3] and Manski and Pepper [1]. Let Aω denote the treatment assigned to subject ω: Aω = t if ω is assigned to the treatment t, where t = 0, 1, …, k is an ordered categorical variable. Yω denotes the outcome for subject ω, and Yω(t) denotes the potential outcome of Y for subject ω if ω is assigned to treatment t.

We use the notation BCD to denote that B is independent of C conditional on D [4]. The assumption of “unconfoundedness” or “weak ignorability” or “selection on observables” [5] for observed covariates X is that Y(t)AX, where X denotes a covariate or a set of covariates. The assumption allows for identification of treatment effects by

E[Y(t)]E[Y(0)]=xE(Y|A=t,X=x)P(X=x)xE(Y|A=0,X=x)P(X=x).

However, the assumption itself is often not plausible. The literature on partial identification allows for more credible inference for treatment effects because it makes much weaker assumptions [1, 2, 68]. Some of this literature on partial identification derives bounds by imposing various monotonicity assumptions which are considerably weaker than “selection on observables.” This paper advances this literature further by considering the relationships that hold between different monotonicity assumptions and by deriving new bounds under differing monotonicity assumptions that will sometimes be tighter than those currently offered in the literature.

The paper is organized as follows. In Section 2, we review the MTR, MTS, and MC assumptions and the bounds under them. In Section 3, we give results concerning relations between MTS and MC. In Section 4, we provide new results on bounds under MTR and a reversed conditional analog of MTS. Section 5 offers some concluding remarks.

## 2 Review of existing assumptions and bounds

Without monotonicity assumptions, the nonparametric bounds for an ordered categorical treatment are as follows [1]:

E(Y|A=t)P(A=t)+K0stP(A=s)E[Y(t)]E(Y|A=t)P(A=t)+K1stP(A=s),

where K0 and K1 are the lower and upper bounds of Y, respectively.

The nonparametric bounds can be improved by adding certain monotonicity assumptions. In this section, we review assumptions introduced in the previous literature and the bounds under them. We review the MTR and MTS assumptions in Section 2.1. In Section 2.2, we review the MC assumption.

### 2.1 Monotone treatment response and monotone treatment selection

Manski [1, 6] introduced the following MTR assumption:

ASSUMPTION 1 (MTR assumption). Let T be an ordered set, and t1and t2be elements of T. Then, t2≥t1Yω(t2) ≥ Yω(t1) for each ω.

The MTR assumption is simply that for each individual an increase in the treatment variable A will increase or leave unchanged the outcome that would be observed.

PROPOSITION 1 (Manski [1, Corollary M1.2]). Under the MTR assumption, bounds on E[Y(t)] are

stE(Y|A=s)P(A=s)+K0P(A>t)E[Y(t)]stE(Y|A=s)P(A=s)+K1P(A<t).

A characteristic of the MTR bounds is that the upper bound for treatment category of A = 0 is equal to the lower bound for that of A = k. Thus, the lower bound on the causal effect, E[Y(k)] – E[Y(0)], under the MTR assumption is always zero.

Manski and Pepper [2] introduced the following MTS assumption:

ASSUMPTION 2 (MTS assumption). Let T be an ordered set, and t1and t2be elements of T. Then, t2t1E[Y(t) | A = t2] ≥ E[Y(t) | A = t1].

The interpretation of the MTS assumption is that if we compare two groups, those who actually received treatment level t1 and those who actually received some higher level of treatment t2, then, for each fixed treatment level t, the average outcomes that would have occurred if treatment had been fixed to t, is at least as high in the group that actually received t2 as it is in the group that actually received t1.

PROPOSITION 2 (Manski and Pepper [2, Proposition 1, Corollary 2]). Under the MTS assumptions, bounds on E[Y(t)] are

E(Y|A=t)P(At)+K0P(A<t)E[Y(t)]E(Y|A=t)P(At)+K1P(A>t).

Bounds with narrower width follow by combining the MTR and MTS assumptions.

PROPOSITION 3 (Manski and Pepper [2, Proposition 2, Corollary 2]). Under the MTR and MTS assumptions, bounds on E[Y(t)] are

stE(Y|A=s)P(A=s)+E(Y|A=t)P(A>t)E[Y(t)]stE(Y|A=s)P(A=s)+E(Y|A=t)P(A<t).

Note that even when the MTS assumption is added to the MTR assumption, the upper bound for treatment category of A = 0 is still equal to the lower bound for that of A = k, and thus the lower bound on E[Y(k)] – E[Y(0)] is always zero.

### 2.2 Monotone confounding

For observed covariates X, VanderWeele [3] introduced the following MC assumption:

ASSUMPTION 3 (MC assumption). There exists univariate U such that Y(t) A| {X, U} and both E(Y | A = t, X = x, U = u) and E(A | X = x, U = u) are either non-decreasing or non-increasing in u for all t and x.

The MC assumption would be plausible if there were a single unmeasured covariate U that affected both A and Y such that controlling for the measured covariates X and the unmeasured covariate U sufficed to control for confounding and such that U affected treatment A and outcome Y in the same direction. The interpretation of MC in the presence of multiple unmeasured variables, however, is more complicated. Consider for example the relations in Figure 1 in which both U and V are unmeasured.

### Figure 1

Example in which the effect of A on Y is unconfounded conditional on (X, U) but for which the MC assumption is difficult to interpret because of V

In this diagram, controlling for just U along with the measured covariates X would suffice to control for confounding for the effect of A on Y [9]. However, in Figure 1, the effect of U on A is itself confounded and the MC assumption that E(A | X = x, U = u) is monotonic in u would not simply correspond to the effect of U on A being monotonic but would require that the association between U and A due to V was such that it was still the case that E(A | X = x, U = u) was monotonic. The MC assumption may thus be difficult to establish or argue for on substantive grounds. It is, however, how epidemiologists often conceive of confounding [10] and for this reason it is of interest, from a theoretical perspective at least, to compare this conceptualization to that of MTS above. As we have seen, the interpretation of the MC assumption is only straightforward when there is a single unmeasured variable and appeal should probably only be made to the assumption in such a context. With a single unmeasured variable, the monotonicity conditions can be interpreted simply as that U affects A and Y in the same direction. While it is impossible to know certainly that there is a single unmeasured confounding variable, epidemiologic studies often have very rich and extensive data on confounding variables which may make such an assumption more plausible in some epidemiologic studies than it would in many economic research contexts wherein data on important confounding variables is sometimes more limited.

For example, many clinical studies have data on nearly all of the information that is available to a physician who makes decisions concerning the treatments of patients. Not infrequently the only information that is not available in a clinical study is the physician’s subjective assessment of the patient’s prognosis. This subjective assessment could potentially serve as the unmeasured confounding variable in the MC assumption. For instance, Shahinian et al. [11] conducted a cohort study of prostate cancer patients to measure the effect of androgen deprivation therapy (ADT, the standard treatment for prostate cancer, which has fatigue, weakness, and frailty as side effects) on the occurrence of fractures. Control was made for a number of demographic and clinical variables including age, race, grade of prostate cancer, other cancer treatments received, and the occurrence of a fracture or the diagnosis of osteoporosis during the 12 months preceding the diagnosis of cancer. The study did not, however, measure the physicians’ subjective assessment of the patients’ maneuverability. Those with a high level of maneuverability will in general be less likely to be frail and experience a fracture and thus also more likely to receive ADT. The MC assumption may thus be plausible in this context.

As will be seen below, many of the results that follow also hold for multivariate U when the components of U are conditionally independent given X. For multivariate U, we say that uu′ whenever each component of u is greater than or equal to the corresponding component of u′. The MC assumption for multivariate U is then once again that E(Y | A = t, X = x, U = u) and E(A | X = x, U = u) are either both non-decreasing or both non-increasing in u for all t and x. Many of the results below hold with multivariate U under the multivariate MC assumption but also require that the components of U are conditionally independent given X. Such conditional independence is a strong assumption and will be difficult to evaluate since U is unmeasured. Thus, once again, appeal to the MC assumption will in general only be reasonable when there is a single unmeasured covariate. Although we note below when results do hold with multivariate U under the conditional independence assumption, we do this principally for completeness rather than because of the utility of the results for multivariate U, which we believe, in most cases, will be limited. Alternatively, the results would also apply if there were a multivariate unmeasured confounder that could be summarized into a single score U for which the MC assumption held.

PROPOSITION 4 (VanderWeele [3, Theorem 1]). For binary treatment A (t2 > t1), under the MC assumption, bounds on E[Y(t)] are

E[Y(t2)]xE(Y|A=t2,X=x)P(X=x),
E[Y(t1)]xE(Y|A=t1,X=x)P(X=x).
Furthermore, if U is multivariate, the conclusion still holds if the components of U are conditionally independent given X.

The inequalities in Proposition 4 are reversed if, in Assumption 3, one of the conditional expectations is non-increasing in u for all t and x and the other is non-decreasing in u for all t and x.

VanderWeele [3] considered only bounds for a binary treatment. Chiba [12] showed that results similar to Proposition 4 also held for the average treatment effect on the treated employing conditional expectations of the form, E[Y(t) | A = 1].

Proposition 4 above was proved using the following lemma that will also be used in developing subsequent results.

LEMMA 1 (Esary et al. [13, Theorem 2.1]). Let f and g be functions with n real-valued arguments such that both f and g are non-decreasing in each of their arguments. If X = (X1,…, Xn) is a multivariate random variable with n components such that each component is independent of the other components, then Cov{f(X), g(X)} ≥ 0.

Construction of confidence intervals for the bounds of a causal effect, sometimes also called “uncertainty regions” [14], is a challenging problem and will not be explored further here as our focus here is principally conceptual, on the relationship between different monotonicity assumptions. The interested reader is referred to recent literature on the topic for further discussion [1418].

## 3 Relation between monotone treatment selection and monotone confounding

In this section, we discuss the relation between the MTS and MC assumptions. The discussion is facilitated by considering a conditional version of the MTS assumption. The conditional version of MTS will be useful as it more closely corresponds to the MC assumption. The conditional MTS assumption and the unconditional MTS assumption coincide if the set of conditioning variables is empty. The conditional MTS assumption can be stated follows:

ASSUMPTION 4 (Conditional MTS assumption). Let T be an ordered set, and t1and t2be elements of T. Then, t2t1E[Y(t) | A = t2, X = x] ≥ E[Y(t) | A = t1, X = x] for all x.

The conditional MTS assumption implies MTS but the converse does not hold. By applying the MTS bounds under each distinct value of x and then summing over x we have the following immediate Corollary to Propositions 2 and 3 above.

COROLLARY 1. Under conditional MTS, the bounds on E[Y(t)] are

(1)stxE(Y|A=t,X=x)P(X=x|A=s)P(A=s)+K0P(A<t)E[Y(t)]stxE(Y|A=t,X=x)P(X=x|A=s)P(A=s)+K1P(A>t).
Under the conditional MTS and MTR assumptions, the bounds on E[Y(t)] are
(2)s<tE(Y|A=s)P(A=s)+stxE(Y|A=t,X=x)P(X=x|A=s)P(A=s)E[Y(t)]s>tE(Y|A=s)P(A=s)+stxE(Y|A=t,X=x)P(X=x|A=s)P(A=s).

Note that the upper bound for A = 0 is also equal to the lower bound for A = k under the conditional MTS–MTR assumptions, and the lower bound on E[Y(k)] – E[Y(0)] is always zero.

In the case of a binary treatment, the conditional MTS bounds are the same as the MC bounds on E[Y(t)]. This follows immediately from inequality (1). Likewise, the conditional MTS bounds on E[Y(t) | A = a] (a = 0, 1) are also the same as the MC bounds. This is easily verified as follows:

EY(1)|A=0=xE[Y(1)|A=0,X=x]P(X=x|A=0)xE[Y(1)|A=1,X=x]P(X=x|A=0)=xE(Y|A=1,X=x)P(X=x|A=0)

by the conditional MTS assumption. Similarly,

E[Y(0)|A=1]xE(Y|A=0,X=x)P(X=x|A=1)

is derived.

We now consider the relation between the MTS and MC assumptions themselves (rather than simply the bounds that are obtained under them). The following proposition states that MC and conditional MTS are equivalent for binary treatment. The result also holds under MC with multivariate U, provided the components of U are conditionally independent of one another given X.

PROPOSITION 5. For binary treatment, the MC assumption implies the conditional MTS assumption. If in addition, the positivity assumption holds, i.e. miny1,y0P[A=1|Y(1)=y1,Y(0)=y0,X=x]>0 for all x, then the conditional MTS assumption implies the MC assumption.

PROOF. If the MC assumption holds, then for t = 0, 1,

E[Y(t)|A=1,X=x]=uE[Y(t)|A=1,X=x,U=u]P(U=u|A=1,X=x)=uE[Y(t)|A=t,X=x,U=u]P(U=u|A=1,X=x)=uE[Y|A=t,X=x,U=u]P(A=1|X=x,U=u)P(U=u|X=x)P(A=1|X=x)=EFU|X=x{E(Y|A=t,X=x,U=u)P(A=1|X=x,U=u)}P(A=1|X=x)EFU|X=x{E(Y|A=t,X=x,U=u)}EFU|X=x{P(A=1|X=x,U=u)}P(A=1|X=x)=EFU|X=x{E(Y|A=t,X=x,U=u)}=EFU|X=x{E(Y|A=t,X=x,U=u)}EFU|X=x{P(A=0|X=x,U=u)}P(A=0|X=x)EFU|X=x{E(Y|A=t,X=x,U=u)P(A=0|X=x,U=u)}P(A=0|X=x)=uE(Y|A=t,X=x,U=u)P(A=0|X=x,U=u)P(U=u|X=X)P(A=0|X=x)=uE(Y|A=t,X=x,U=u)P(U=u|A=0,X=x)=uE[Y|A=0,X=x,U=u]P(U=u|A=1,X=x)=E[Y(t)|A=0,X=x]

The first inequality follows because by Lemma 1,

EΦY|Ξ=ξ{φ(ξ,Y)γ(ξ,Y)}EΦY|Ξ=ξ{φ(ξ,Y)}EΦY|Ξ=ξ{γ(ξ,Y)}=CovΦY|Ξ=ξ{φ(ξ,Y),γ(ξ,Y)}0,

where both f(x, U) = E(Y | A = t, X = x, U = u) and g(x, U) = P(A = 1| X = x, U = u) are non-decreasing in u. Likewise, the second inequality follows because E(Y | A = t, X = x, U = u) is non-decreasing in u and P(A = 0 | X = x, U = u) is non-increasing in u. A similar calculation holds in the case in which both f(x, U) and g(x, U) are non-increasing in u. We thus see that if the MC assumption holds then the conditional MTS assumption also holds.

On the other hand, if the conditional MTS assumption holds, then we have that the joint distribution of {Y(1), Y(0), A, X} are such that E[Y(t) | A = 1, X = x] E[Y(t) | A= 0, X = x]. Let U denote a binary variable such that P[U=1|Y(1)=y1,Y(0)=y0,A=t,X=x]=θy1y01x and P[U=1|Y(1)=y1,Y(0)=y0,A=0,X=x]=θy1y00x with

θy1y01x=(1e0x1)Ry1y0x1e0x1e1x,θy1y00x=(1e1x1)Ry1y0xθy1y01x,

where Ry1y0x=P[A=0|Y(1)=y1,Y(0)=y0,X=x]P[A=1|Y(1)=y1,Y(0)=y0,X=x] and e0x and e1x are constants satisfying 0<e0x<e1xminy1,y0{P[A=1|Y(1)=y1,Y(0)=y0,X=x]}.

We show that this U satisfies the MC assumption.

First, we have

P[A=1|Y(1)=y1,Y(0)=y0,U=1,X=x]=P[U=1|Y(1)=y1,Y(0)=y0,A=1,X=x]P[A=1|Y(1)=y1,Y(0)=y0,X=x]t=0,1P[U=1|Y(1)=y1,Y(0)=y0,A=t,X=x]P[A=t|Y(1)=y1,Y(0)=y0,X=x]=θy1y01xP[A=1|Y(1)=y1,Y(0)=y0,X=x]t=0,1θy1y0txP[A=t|Y(1)=y1,Y(0)=y0,X=x]=11+θy1y00xθy1y01xRy1y0x=11+1e1x1=e1x,
P[A=1|Y(1)=y1,Y(0)=y0,U=0,X=x]=P[U=0|Y(1)=y1,Y(0)=y0,A=1,X=x]P[A=1|Y(1)=y1,Y(0)=y0,X=x]t=0,1P[U=0|Y(1)=y1,Y(0)=y0,A=t,X=x]P[A=t|Y(1)=y1,Y(0)=y0,X=x]=(1θy1y01x)P[A=1|Y(1)=y1,Y(0)=y0,X=x]t=0,1(1θy1y0tx)P[A=t|Y(1)=y1,Y(0)=y0,X=x]=11+1θy1y00x1θy1y01xRy1y0x=11+Ry1y0x1e1x1θy1y01x1θy1y01x=11+1e0x1=e0x.

Since e1x and e0x only depend on x, we have

P[A=1|U=1,X=x]=P[A=1|Y(1)=y1,Y(0)=y0,U=1,X=x]=e1x,P[A=1|U=0,X=x]=P[A=1|Y(1)=y1,Y(0)=y0,U=0,X=x]=e0x,

which means Y(t)A | {X, U}.

Second, by construction we have e0xe1x, thus we E (A = 1|X = x, U=u) is non-decreasing in u.

Third, we can get

P[U=1|Y(1)=y1,A=1,X=x]=y0θy1y01xP[Y(0)=y0|Y(1)=y1,A=1,X=x]=y01e0x1Ry1y0x1e0x1e1xP[Y(0)=y0|Y(1)=y1,A=1,X=x]=11e0x1e1x1e0x1y0Ry1y0xP[Y(0)=y0|Y(1)=y1,A=1,X=x]=11e0x1e1x1e0x1y0P[A=0|Y(1)=y1,Y(0)=y0,X=x]P[A=1|Y(1)=y1,Y(0)=y0,X=x]P[Y(0)=y0|Y(1)=y1,A=1,X=x]
=11e0x1e1x1e0x1y0P[A=0|Y(1)=y1,Y(0)=y0,X=x]P[Y(0)=y0|Y(1)=y1,X=x]P[A=1|Y(1)=y1,X=x]=11e0x1e1x1e0x1y0P[A=0,Y(0)=y0|Y(1)=y1,X=x]P[A=1|Y(1)=y1,X=x]=11e0x1e1x1e0x1P[A=0|Y(1)=y1,X=x]P[A=1|Y(1)=y1,X=x]=11e0x1e1x1e0x1P[Y(1)=y1|A=0,X=x]P(A=0|X=x)P[Y(1)=y1|A=1,X=x]P(A=0|X=x).

Therefore, we have

E[Y|A=1,X=x,U=1]=y1y1P[Y(1)=y1|A=1,X=x,U=1]=y1y1P[U=1|Y(1)=y1,A=1,X=x]P[Y(1)=y1|A=1,X=x]P(U=1|A=1,X=x)=1P(U=1|A=1,X=x)11e0x1e1xy1y11e0x1P[Y(1)=y1|A=0,X=x]P(A=0|X=x)P[Y(1)=y1|A=1,X=x]P(A=1|X=x)P[Y(1)=y1|A=1,X=x]=E[Y|A=1,X=x]P(U=1|A=1,X=x)11e0x1e1x1e0x1E[Y(1)|A=0,X=x]P(A=0|X=x)E[Y(1)|A=1,X=x]P(A=1|X=x),

where

P(U=1|A=A,X=x)=y1P[U=1|Y(1)=y1,A=1,X=x]P[Y(1)=y1|A=1,X=x]=11e0x1e1xy11e0x1P[Y(1)=y1|A=0,X=x]P(A=0|X=x)P[Y(1)=y1|A=1,X=x]P(A=1|X=x)P[Y(1)=y1|A=1,X=x]=11e0x1e1x1e0x1P(A=0|X=x)P(A=1|X=x).

Then, we have

E[Y|A=1,U=1,X=x]/E[Y|A=1,X=x]=1e0x1E[Y(1)|A=0,X=x]P(A=0|X=x)E[Y(1)|A=1,X=x]P(A=1|X=x)/1e0x1P(A=0|X=x)P(A=1|X=x).

Since E[Y(t) | A= 1, X = x] E[Y(t) | A= 0, X = x] for t = 0,1, we have E[Y| A= 1, X = x, U= 1] E[Y | A= 1, X = x], and then E[Y|A= 1, X=x, U= 1] E[Y|A= 1, X=x, U= 0]. Similarly, we can get E[Y|A= 0, X=x, U= 1] E[Y|A= 0, X=x, U= 0]. Therefore, E[Y|A=t, X=x, U=u] is non-decreasing in u. □

Although MC and conditional MTS are equivalent for binary treatment, the MC assumption does not imply the conditional MTS assumption for a treatment variable with three or more levels, as stated in the next proposition.

PROPOSITION 6. For treatment with three or more level, the MC assumption does not imply the conditional MTS assumption.

PROOF. For simplicity, we assume X is empty.

First, suppose P(U = 1) = P(U = 0) = 0.5, and A can take a value in {0,1,2} with P(A = 0|U = 1) = 0.3, P(A = 0|U = 0) = 0.5, P(A = 1|U = 1) = 0.5, P(A = 1|U = 0) = 0.2, P(A = 2|U = 1) = 0.2 and P(A = 2|U = 0) = 0.3, then we have E[A|U = 1] = 0.9 and E[A|U = 0] = 0.8. Thus, E[A|U = u] is increasing in u. Suppose E[Y|A = t, U = 1]>E[Y|A = t, U = 0] for t = 0,1,2. Therefore, the MC assumption holds. Then we can get

E[Y(1)|A=1]=u=0,1E[Y|A=1,U=u]P(U=u|A=1)=57E[Y|A=1,U=1]+27E[Y|A=1,U=0],
E[Y(1)|A=2]=u=0,1E[Y|A=1,U=u]P(U=u|A=2)=0.4E[Y|A=1,U=1]+0.6E[Y|A=1,U=0].

Thus, E[Y(1)|A = 1]>E[Y(1)|A = 2], which means that the MTS assumption does not hold.

However, the problem whether the conditional MTS assumption implies the MC assumption for a treatment variable with three or more levels remains open.

## 4 New bounds under reverse conditional monotone treatment selection and monotone treatment response

We finally give one more proposition that gives bounds when the MTR and conditional MTS assumptions hold but in opposite directions from each other. We thus further introduce the following reverse conditional MTS assumption:

ASSUMPTION 5 (Reverse conditional MTS assumption). Let T be an ordered set, and t1and t2be elements of T. Then, t2t1E[Y(t) | A = t2, X = x] ≤ E[Y(t) | A = t1, X = x] for all x.

New bounds can be derived under a combination of the MTR and reverse conditional MTS assumptions.

PROPOSITION 7. Suppose that the MTR assumption holds and that the reverse conditional MTS assumption holds, then

K0P(A>t)+xstmaxE(Y|A=t,X=x)E(Y|A=s,X=x)P(A=s|X=x)P(X=x)E[Y(t)]K1P(A<t)+xstminE(Y|A=t,X=x)E(Y|A=s,X=x)P(A=s|X=x)P(X=x).

PROOF.

E[Y(t)|X=x]=stE[Y(t)|A=s,X=x]P(A=s|X=x)+s<tE[Y(t)|A=s,X=x]P(A=s|X=x)stE[Y(t)|A=s,X=x]P(A=s|X=x)+K1s<tP(A=s|X=x).

Now, for st, we have that E[Y(t) | A = s, X = x] ≤ E[Y(s) | A = s, X = x] by the MTR assumption and E[Y(t) | A = s, X = x] ≤ E[Y(t) | A = t, X = x] by the reverse conditional MTS assumption. Thus, we have that

E[Y(t)|A=s,X=x]minE[Y(t)|A=t,X=x]E[Y(s)|A=s,X=x]=minE(Y|A=t,X=x)E(Y|A=s,X=x).

Consequently,

E[Y(t)|X=x]stminE(Y|A=t,X=x)E(Y|A=s,X=x)P(A=s|X=x)+K1P(A<t|X=x).

Multiplying by P(X = x) and summing over x establishes the right-hand side of the inequality of Proposition 7. The left-hand side of the inequality of Proposition 7 is proved similarly. This completes the proof. □

Note that if A is binary then we have the following bounds for E[Y(0)] and E[Y(1)]:

K0P(A=1)+xE(Y|A=0,X=x)P(A=0|X=x)P(X=x)E[Y(0)]xE(Y|A=0,X=x)P(A=0|X=x)P(X=x)+xminE(Y|A=0,X=x)E(Y|A=1,X=x)P(A=1|X=x)P(X=x),

and

xE(Y|A=1,X=x)P(A=1|X=x)P(X=x)+xmaxE(Y|A=0,X=x)E(Y|A=1,X=x)P(A=0|X=x)P(X=x)E[Y(1)]K1P(A=0)+xE(Y|A=1,X=x)P(A=1|X=x)P(X=x).

In particular, a lower bound on the causal effect is given by

E[Y(1)]E[Y(0)]xE(Y|A=1,X=x)P(A=1|X=x)P(X=x)+xmax{E(Y|A=0,X=x)E(Y|A=1,X=x)}P(A=0|X=x)P(X=x)[xE(Y|A=0,X=x)P(A=0|X=x)P(X=x)+xmin{E(Y|A=0,X=x)E(Y|A=1,X=x)}P(A=0|X=x)P(X=x)].

This lower bound for the causal effect will often be greater than (and never less than) the lower bound given in VanderWeele [3] under the MC (or conditional MTS) assumption, which was

E[Y(1)]E[Y(0)]xE(Y|A=1,X=x)P(X=x)xE(Y|A=0,X=x)P(X=x)=xE(Y|A=1,X=x)P(A=1,X=x)P(X=x)+xE(Y|A=1,X=x)P(A=0,X=x)P(X=x){xE(Y|A=0,X=x)P(A=0,X=x)P(X=x)+xE(Y|A=0,X=x)P(A=1,X=x)P(X=x)}.

## 5 Conclusion

When thinking about the direction and extent of bias due to unmeasured confounding, different formal assumptions have been employed. Economists have more frequently appealed to a MTS assumption. Epidemiologists have more frequently appealed to a MC assumption. In this paper, we have shown that the MC assumption and the conditional MTS assumptions are in fact equivalent for binary treatment variables, but the MC assumption does not imply the conditional MTS assumption for treatment variables with three or more levels. We have also presented new bounds under the MTR assumption along with a reversed analog of the conditional MTS assumption. These new bounds will generally be narrower than bounds currently in the literature. The MC assumption is often more difficult to justify on substantive grounds than the conditional MTS assumption. This is because the MC assumption will generally only be reasonable if there is a single unmeasured confounder and it is often difficult to know this in practice. However, again, in the case of a binary treatment, the two assumptions are in fact equivalent and so, with a binary treatment, epidemiologists and economists can reasonably conceptualize unmeasured confounding in either manner.

# Acknowledgments

The authors thank the associate editor and two reviewers for their valuable comments. This work was supported partially by Grant-in-Aid for Scientific Research (No. 23700344) from the Ministry of Education, Culture, Sports, Science, and Technology of Japan and National Institutes of Health grants ES017876.

### References

1. ManskiCF, PepperJV. Monotone instrumental variables: with an application to the returns to schooling. Econometrica2000;68:9971010.Search in Google Scholar

2. VanderWeeleTJ. The sign of the bias of unmeasured confounding. Biometrics2008;64:7026.Search in Google Scholar

3. ManskiCF. Monotone treatment response. Econometrica1997;65:131134.Search in Google Scholar

4. DawidAP. Conditional independence in statistical theory (with discussion). J R Stat Soc Ser B1979;41:131.Search in Google Scholar

5. RosenbaumPR, RubinDB. The central role of the propensity score in observational studies for causal effects. Biometrika1983;70:4155.Search in Google Scholar

6. ManskiCF. Nonparametric bounds on treatment effects. Am Econ Rev1990;80:31923.Search in Google Scholar

7. ManskiCF. Partial identification of probability distributions. New York: Springer, 2003.Search in Google Scholar

8. ManskiCF, PepperJV. More on monotone instrumental variables. Econometrics J2009;12:S20016.Search in Google Scholar

9. PearlJ. Causality: models, reasoning, and inference, 2nd ed. Cambridge: Cambridge University Press, 2009.Search in Google Scholar

10. Vander StoepA, BeresfordSA, WeissNS. A didactic device for teaching epidemiology students how to anticipate the effect of a third factor on an exposure-outcome relation. Am J Epidemiol1999;150:221.Search in Google Scholar

11. ShahinianVB, KuoYF, FreemanJL, GoodwinJS. Risk of fracture after androgen deprivation for prostate cancer. New Engl J Med2005;352:15464.Search in Google Scholar

12. ChibaY. The sign of the unmeasured confounding bias under various standard populations. Biom J2009;51:6706.Search in Google Scholar

13. EsaryJD, ProschanF, WalkupDW. Association of random variables, with applications. Ann Math Stat1967;38:146674.Search in Google Scholar

14. VansteelandtG, GoetghebuerE, KenwardMG, MolenberghsG. Ignorance and uncertainty regions as inferential tools in a sensitivity analysis. Stat Sin2006;16:95379.Search in Google Scholar

15. BugniFA. Bootstrap inference in partially identified models defined by moment inequalities: coverage of the identified set. Econometrica2010;78:73553.Search in Google Scholar

16. ImbensGW, ManskiCF. Confidence intervals for partially identified parameters. Econometrica2004;72:184557.Search in Google Scholar

17. RomanoJP, ShaikhAM. Inference for identifiable parameters in partially identified econometric models. J Stat Plann Inference2008;138:2786807.Search in Google Scholar

18. TodemD, FineJ, PengL. A global sensitivity test for evaluating statistical hypotheses with non-identifiable models. Biometrics2010;66:55866.Search in Google Scholar

Published Online: 2014-4-18
Published in Print: 2014-3-1

©2014 by Walter de Gruyter Berlin / Boston