On the Monotonicity of a Nondifferentially Mismeasured Binary Confounder

Suppose that we are interested in the average causal effect of a binary treatment on an outcome when this relationship is confounded by a binary confounder. Suppose that the confounder is unobserved but a nondifferential proxy of it is observed. We show that, under certain monotonicity assumption that is empirically verifiable, adjusting for the proxy produces a measure of the effect that is between the unadjusted and the true measures.


Introduction
Suppose that we are interested in the average causal effect of a binary treatment A on an outcome Y when this relationship is confounded by a binary confounder C. Suppose also that C is nondifferentially mismeasured, meaning that (i) C is not observed and, instead, a binary proxy D of C is observed, and (ii) D is conditionally independent of A and Y given C. The causal graph to the left in Figure 1 represents the relationships between the random variables. Greenland (1980) argues that adjusting for D produces a partially adjusted measure of the average causal effect of A on Y that is between the crude (i.e., unadjusted) and true (i.e., adjusted for C) measures. Ogburn and VanderWeele (2012) show that, although this result does not always hold, it does hold under some monotonicity condition in C. Specifically, E[Y A, C] must be nondecreasing or nonincreasing in C. Since this condition can be interpreted as that the average causal effect of C on Y must be in the same direction among the treated (A = 1) and the untreated (A = 0), Ogburn and VanderWeele (2012) argue that the condition is likely to hold in most applications in epidemiology. Unfortunately, the condition cannot be verified empirically because C is unobserved. Therefore, one has to rely on substantive knowledge to verify it. Moreover, the condition is sufficient but not necessary. Ogburn and VanderWeele (2013) extend these results to the case where C takes more than two values. If there are at least two independent  Figure 1. Causal graphs, where Y is a discrete or continuous random variable, and A, C and D are binary random variables. Moreover, C is unobserved.
proxies of C, then Miao et al. (2018) show that the causal effect of A on Y can be identified under certain rank condition.
In this paper, we prove that if the monotonicity condition holds in D, then it holds in C as well. Since D is observed, the monotonicity condition in D can be verified empirically. Therefore, if no substantive knowledge is available but data are, then combining our result with that by Ogburn and VanderWeele (2012) may allow us to conclude that the partially adjusted effect is between the crude and the true ones and, thus, that the partially adjusted effect is a better approximation to the true effect than the crude one. We also report experiments showing that most random parameterizations of the causal graph to the left in Figure 1 result in a partially adjusted effect that lies between the crude and the true ones, although only half of them satisfy the monotonicity condition in D. This confirms that the condition is sufficient but not necessary. This result should be interpreted with caution because, in fields like epidemiology, one is not typically concerned with a random parameterization but, rather, with one carefully engineered by evolution. We also prove that if the monotonicity condition holds in D, then it also holds in C when D is a driver of C rather than a proxy, i.e. D causes C. We illustrate the relevance of this result with an example on transportability of causal inference across populations.
The rest of the paper is organized as follows. Sections 2 and 3 present our results when D is a proxy and a driver of C, respectively. Section 4 closes with some discussion.

On a Proxy of the Confounder
Consider the causal graph to the left in Figure 1, where Y is a discrete or continuous random variable, and A, C and D are binary random variables. The graph entails the following factorization: (1) Let A take values a and a, and similarly for C and D. Let A, D and Y be observed and let C be unobserved. Let Y a and Y a denote the counterfactual outcomes under treatments A = a and A = a, respectively. The average causal effect of A on Y or true risk difference (RD true ) is defined as It can be rewritten as follows (Pearl, 2009, Theorem 3.3.2): Since C is unobserved, RD true cannot be computed. It can be approximated by the unadjusted average causal effect or crude risk difference (RD crude ): and by the partially adjusted average causal effect or observed risk difference (RD obs ): We  C] is monotone in C, then RD obs lies between RD true and RD crude . The antecedent of this rule cannot be verified empirically, because C is unobserved. Therefore, one must rely on substantive knowledge to apply the rule. The following theorem implies that, luckily, the rule also holds for D and, thus, that the antecedent can be verified empirically.
Theorem 1. Consider the following causal graph to the left in Figure  1.
This gives four cases to consider: Whether Equation 2 or 3 holds, and whether Equation 4 or 5 holds. Hereinafter, we focus on the first case. The other cases are similar.
Assume that Equations 2 and 4 hold. We show next that the first inequalities in Equations 2 and 4 imply that p(c a, d) ≤ p(c a, d). Specifically, because Y is conditionally independent of D given A and C due to the causal graph under consideration and, thus, is known as the log odds, and σ() is known as the logistic sigmoid function (Bishop, 2006, Section 4.2). Note that σ() is an increasing function. Then, Likewise, the second inequalities in Equations 2 and 4 imply that which contradicts Equation 6 unless equality holds. However, equality only occurs if p(d c) = p(d c), which implies that C and D are independent and, thus, that D is not a mismeasured confounder. 2.1. Experiments. In this section, we report some experiments that shed additional light on the relationships between the various risk differences. Specifically, we randomly parameterized 10000 times the causal graph to the left in Figure 1 by parameterizing the terms in the right-hand side of Equation 1 with parameter values drawn from a uniform distribution. 1 For each parameterization, we then computed RD true , RD obs and RD crude . The results are reported in Table 1. Of the 10000 runs, 4891 were monotone in C and also in D, as expected from Ogburn and VanderWeele (2012, Lemma 1). There were no other runs that were monotone in D, as expected from Theorem 1. In all these 4891 runs, RD obs was between RD true and RD crude , as expected from Corollary 2 and Ogburn and VanderWeele (2012, Result 1). It is (tr) Distance between RD obs and RD true relative to interval length. (bl) Zoom of previous plot. (br) Distance between RD obs and RD true relative to interval length, as a function of correlation between C and D when measured by Youden index. also worth noticing from the table that the 10000 runs are rather evenly distributed among the different entries. Finally, 4460 of the 5109 runs where the monotonicity assumption did not hold still resulted in that RD obs was between RD true and RD crude . In other words, although half of the runs violated the monotonicity assumption, few of them resulted in RD obs being outside the range of RD true and RD crude . In total, RD obs was between RD true and RD crude in 94 % of the runs. A surprisingly large percentage. Therefore, RD obs was a better approximation to RD true than RD crude in most of the runs.
The plots in Figure 2 show some additional descriptive statistics for the runs where RD obs belonged to the interval between RD true and RD crude . The top left plot shows that most intervals were quite small and, thus, that RD obs was a good approximation to RD true in most cases. However, the top right plot shows that RD obs was typically closer to RD crude than to RD true . The bottom left plot is a zoom of the previous plot at the smallest intervals. Finally, the bottom right plot shows that the lower the correlation between C and D when measured by the Youden index, the closer RD obs was to RD crude . In summary, RD obs seems to be a good approximation to RD true , but it seems to be biased towards RD crude . This is a problem when the interval between RD crude and RD true is large. However, the length of the interval is unknown in practice, and we doubt substantive knowledge may provide hints on it. The bias seems to decrease with increasing correlation between C and D. Although this correlation is unknown in practice, substantive knowledge may give hints on it.

On a Driver of the Confounder
Consider the causal graph to the right in Figure 1, where Y is a discrete or continuous random variable, and A, C and D are binary random variables. Note that D is now a driver rather than a proxy of C, i.e. D causes C. The graph entails the following factorization: Let A take values a and a, and similarly for C and D. Let A, D and Y be observed and let C be unobserved. We show next that our previous results also apply to the new causal graph under consideration.
Theorem 3. Consider the causal graph to the right in Figure 1.
Proof. The proof of Theorem 1 also applies when D is a driver of C.
Corollary 4. Consider the causal graph to the right in Figure 1. If E[Y A, D] is monotone in D, then RD obs lies between RD true and RD crude .
Proof. Note that every probability distribution that is representable by the causal graph to the right in Figure 1 can be represented by the causal graph to the left in Figure 1: C) where the subscript L or R indicates whether we refer to Equation 1 or 7, respectively. Moreover, let and .
Therefore, RD crude , RD obs and RD true are the same whether they are computed from the graph to the right or to the left in Figure 1. Likewise, if E[Y A, D] is monotone in D for the graph to the right in Figure  1, then it is also monotone in D for the graph to the left, which implies that RD obs lies between RD true and RD crude by Corollary 2.

VanderWeele et al. (2008, Result 1) prove that (i) if E[Y A, C] and E[A C] are both nondecreasing or both nonincreasing in C, then RD obs ≥ RD true , and (ii) if E[Y A, C] and E[A C]
are one nondecreasing and the other nonincreasing in C, then RD obs ≤ RD true . The antecedents of these rules cannot be verified empirically, because C is unobserved. Therefore, one must rely on substantive knowledge to apply the rules. Luckily, the rules also hold for D and, thus, the antecedents can be verified empirically. The following theorem proves it.
Theorem 5. Consider the causal graph to the right in Figure 1 As shown in the proof of Theorem 1, (i) and (iii) imply that p(c a, d) ≤ p(c a, d), which implies that Likewise, (ii) and (iv) imply that p(c d) ≥ p(c d), which implies that As shown in the proof of Theorem 1, this contradicts the fact that C and D are dependent. Therefore, either the assumption (iii) or (iv) or both are false. In the latter case, we get a similar contradiction. So, either the assumption (iii) or (iv) is false. We reach a similar contradiction if replace a with a in the assumptions (i) and ( For completeness, we show below that the converse of Theorem 5 also holds.  (Ogburn and Van-derWeele, 2012, Lemma 1) and, thus, they are so for the right graph as well. The result follows now from the contrapositive formulation of Theorem 5.
Given a sufficiently large sample from p(A, D, Y ), we may conclude from it that E[Y A, D] is monotone in D, which implies that RD obs lies between RD true and RD crude by Corollary 4. We can also estimate RD obs and RD crude from the sample, which implies that (i) if RD crude ≤ RD obs then RD obs ≤ RD true , and (ii) if RD crude ≥ RD obs then RD obs ≥ RD true . Consequently, Corollary 6 is superfluous when data over (A, D, Y ) are available. The following example illustrates that the corollary may be useful when no such data are available.
Example 8. Let A and Y represent a treatment and a disease, respectively. Let D and C represent pre-treatment covariates such as socioeconomic and health status, respectively. Say that we have a sample from p 1 (A, D, Y ) and a sample from p 2 (A, D, Y ), i.e. we have two samples from two different populations. We are interested in drawing conclusions about RD true for a third population, from which we have no data. We make the following assumptions: • p 1 (D) ≠ p 3 (D) ≠ p 2 (D) because the socio-economic profile of the third population differs from the other populations' profiles. • p 1 (C D) = p 2 (C D) = p 3 (C D) because this distribution represents psychological and physiological processes shared by the three populations.
tions represent psychological and physiological processes shared by the first and third populations but not by the second. Then, Then, we cannot estimate RD crude for the third population and, thus, we cannot use Corollary 4 as we did before to bound RD true . Corollary 6 may, on the other hand, be useful in drawing conclusions. For instance, assume that E 3 [Y A, D] and E 3 [A D] are both nondecreasing or both nonincreasing in D. Then, RD obs ≥ RD true by the corollary. If we are interested in testing whether k ≥ RD true for a given constant k, then it may be worth assuming the cost of collecting data from the third population in order to compute RD obs , in the hope that k ≥ RD obs which confirms the hypothesis. If we are interested in testing whether RD true ≥ k, then we may also be willing to assume the cost, in the hope that k ≥ RD obs which allows us to reject the hypothesis. In the latter case, we may instead decide to not assume the cost because we can never confirm the hypothesis.

Discussion
We have extended the result by Ogburn and VanderWeele (2012) stating that if E[Y A, C] is monotone in C, then RD obs lies between RD true and RD crude . We have done so by showing that the result also holds when E[Y A, D] is monotone in D. This makes the result much more applicable in practice, as the monotonicity condition in D can be verified empirically. We have also extended the result by VanderWeele et al. (2008) along the same lines.
The monotonicity condition in D is, however, sufficient but not necessary. In fact, we have shown through experiments that 94 % of the random parameterizations of the causal graph studied resulted in RD obs being inside the range of RD true and RD crude . However, the monotonocity condition did not hold for approximately half of them. Therefore, in future work, we plan to study how to relax this condition while keeping it sufficient and empirically testable.