A note on a sensitivity analysis for unmeasured confounding, and the related E-value

Unmeasured confounding is one of the most important threats to the validity of observational studies. In this paper we scrutinize a recently proposed sensitivity analysis for unmeasured confounding. The analysis requires specification of two parameters, loosely defined as the maximal strength of association that an unmeasured confounder may have with the exposure and with the outcome, respectively. The E-value is defined as the strength of association that the confounder must have with the exposure and the outcome, to fully explain away an observed exposure-outcome association. We derive the feasible region of the sensitivity analysis parameters, and we show that the bounds produced by the sensitivity analysis are not always sharp. We finally establish a region in which the bounds are guaranteed to be sharp, and we discuss the implications of this sharp region for the interpretation of the E-value. We illustrate the theory with a real data example and a simulation.


Introduction
Unmeasured confounding is one of the most important threats to the validity of observational studies. In a recent publication, Ding and VanderWeele [3], hereafter DV, proposed a method to assess the sensitivity of observed associations to unmeasured confounding. Briefly, the method requires the analyst to provide guesses of certain sensitivity analysis parameters, defined as the maximal strength of association that an unmeasured confounder may have with the exposure and with the outcome. Given these parameters, the method gives bounds for the causal risk ratio, i.e. a range of values that is guaranteed to include the true exposure effect. In a subsequent publication, VanderWeele and Ding [16] proposed the 'E-value' as a measure of sensitivity to unmeasured confounding, defined as the maximal strength of association that an unmeasured confounder would need to have with the exposure and the outcome, to fully explain away an observed exposure-outcome association.
The method has already been highly influential; as of July 21, 2020, the two publications have 208 and 711 citations, respectively, according to Google Scholar. In a systematic literature search up to the end of 2018, Blum et al. [1] found 87 papers presenting 516 E-values. In this paper we complement DV's work, by deriving and clarifying several important points regarding their sensitivity analysis and E-value. The paper is organized as follows. In Section 2 we review the key elements of DV's sensitivity analysis. In Section 3 we derive the feasible region of DV's sensitivity analysis parameters, which has not been derived previously by these authors. In Section 4 we compare DV's bounds with alternative, assumption-free bounds. We conclude from this comparison that DV's bounds are not always sharp, which contradicts a previous claim by these authors, and we give analytic arguments as to when and why DV's bounds can be expected to be non-sharp. In Section 5 we establish a region in which DV's bounds are guaranteed to be sharp, and we discuss the implications of this sharp region for the interpretation of the E-value. In line with Ding and VanderWeele [3] and VanderWeele and Ding [16] we focus on sensitivity analysis for the causal risk ratio, but we show in Section 6 that all our results carry over with no or little modification to the causal risk difference. Finally, in Sections 7 and 8 we provide a real data example and a simulation, respectively.

DV's sensitivity analysis
We adopt the notation of Ding and VanderWeele [3]. Let E and D denote the binary exposure and outcome, respectively. DV's method applies conditionally on measured covariates C; for notational convenience we will make this conditioning implicit everywhere. To define the causal exposure effect, let D(e) be the potential outcome [12,14] for a given subject, had that subject been exposed to level E = e. The causal exposure-outcome risk ratio is defined as Let U denote a set of unmeasured confounders. To avoid technicalities we follow DV and assume that U can be coded as a categorical variable with levels 0, ..., K − 1, but we emphasize that all results and conclusions that we make hold for any type of U. We assume that U (together with the measured covariates C) is sufficient for confounding control. The method proposed by DV uses certain sensitivity analysis parameters, informally defined as the maximal strength of association between E and U, and between U and D, respectively. Formally, the parameter RR UD is defined as , and the parameter RR EeU is defined as In the latter we have deviated slightly from DV's notation; our parameters RR E1U and RR E0U correspond to their parameters RR EU and RRĒ U , respectively, where RR EU was defined in the main text of Ding and VanderWeele [3], and RRĒ U was defined in proposition A.4 of that paper's eAppendix. Define the bounding factor BF e = RR EeU × RR UD RR EeU + RR UD − 1 .
Ding and VanderWeele [3] showed that, given RR EeU and RR UD , RR true ED is bounded by By providing guesses of {RR E0U , RR E1U , RR UD }, the analyst may use DV's bounds to infer a plausible range for RR true ED . In many situations, one may observe a positive association between the exposure and the outcome (RR obs ED > 1), and one may wonder how much confounding is required to fully 'explain away' this association.
VanderWeele and Ding [16] labeled this magnitude of confounding as the 'E-value'. Formally, VanderWeele et al. [17] defined the E-value as They showed that the E-value is equal to RR obs ED + RR obs ED (RR obs ED − 1).

Feasible region of DV's sensitivity analysis parameters
In order for DV's sensitivity analysis to give meaningful results, the guesses of {RR E0U , RR E1U , RR UD } must be confined to their feasible region, i.e. the region of parameter values that are logically possible. However, Ding and VanderWeele [3] did not derive this region. There are at least two ways that restrictions on a parameter region can arise. First, its definition may imply a restriction. For instance, {RR E0U , RR E1U , RR UD } are all risk ratios, and are thus by definition restricted to non-negative values. Second, a parameter may be restricted by the observed data distribution. As an instructive example, note that we may use the law of total probability to decompose p{D(e) = 1} as By consistency of counterfactuals [12], the potential outcome D(e) is equal to the factual outcome D whenever the factual exposure E is equal to e, which implies that p{D(e) = 1|E = e} = p(D = 1|E = e). We thus have that and In these expressions, only p{D(1) = 1|E = 0} and p{D(0) = 1|E = 1} are unknown. Kasza et al. [9] proposed an alternative sensitivity analysis for unmeasured confounding, by using the ratios and as sensitivity analysis parameters. By varying these parameters over a grid of plausible values one obtains a grid of plausible values for p{D(1) = 1} and p{D(0) = 1}, and hence also RR true ED , through the relations in (3) and (4). However, the ratio in (5) cannot be smaller than p(D = 1|E = 1) and the ratio in (6) cannot be smaller than p(D = 1|E = 0); these limits are attained when the counterfactual probabilities p{D(1) = 1|E = 0} and p{D(0) = 1|E = 1} in the denominators of (5) and (6), respectively, are equal to 1. Hence, the feasible region of these sensitivity analysis parameters is restricted by the observed data distribution.
When the feasible region of a parameter is not restricted by the observed data distribution, we say that the parameter is variation independent of the observed data distribution. This variation independence is a desirable feature of a sensitivity analysis parameter, since it means that the analyst is free, in any given scenario, to speculate about the plausible values of the parameter, as long as these values are within the a priori feasible region of the parameter. This is particularly convenient when the set of measured confounders C is high-dimensional, in which case it would be a daunting task to verify that the considered values for the sensitivity analysis parameter are logically compatible with the observed data distribution within each level of C.
In Appendix A we prove the following theorem, which establishes the feasible region of DV's sensitivity analysis parameters. The theorem implies that the analyst should not consider values of these parameters below 1 in the sensitivity analysis. It further implies that, in any given scenario (i.e. for any observed data distribution), the analyst should consider all values of the parameters above or equal to 1 as logically possible, albeit not necessarily plausible.

DV's bounds vs the assumption-free bounds
Ding and VanderWeele [3] claimed, but did not prove, that their bounds are sharp, in the sense that, for any observed distribution p(D, E) and correctly specified values of {RR E1U , RR UD } the causal risk ratio RR true ED can always be as small as the lower bound in (1), and for any observed distribution p(D, E) and correctly specified values of {RR E0U , RR UD } the causal risk ratio RR true ED can always be as large as the upper bound in (1). This claim is not always correct though.
To see why, note that the relations in (3) and (4) can be used to derive alternative bounds for RR true ED , by arguing as Robins [13]. From (3) we have that p{D(1) = 1} can be no smaller than p(D = 1|E = 1)p(E = 1) and no larger than p(D = 1|E = 1)p(E = 1) + p(E = 0); these bounds are attained when p{D(1) = 1|E = 0} = 0 and p{D(1) = 1|E = 0} = 1, respectively. Similarly, p{D(0) = 1} can be no smaller than p(D = 1|E = 0)p(E = 0) and no larger than p(D = 1|E = 0)p(E = 0) + p(E = 1); these bounds are attained when p{D(0) = 1|E = 1} = 0 and p{D(0) = 1|E = 1} = 1, respectively. After some algebra, this gives the following bounds for RR true ED : where BF e is the bounding factor In contrast to DV's bounds, which require the analyst to speculate about {RR E0U , RR E1U , RR UD }, the bounds in (7) do not require any particular assumptions (apart from consistency of counterfactuals); we thus refer to these bounds as 'assumption-free'. For some observed data distributions p(D, E) and values of {RR E0U , RR E1U , RR UD }, the assumption-free bounds may be sharper than DV's bounds. For instance, suppose that p(E = 1) = p(D = 1|E = 0) = 0.9 and p(D = 1|E = 1) = 0.99. We then have that RR obs ED = 1.1 and the lower assumption-free bound is equal to 0.9. However, suppose that we consider values of RR E1U and RR UD as large as 10 as plausible. DV's lower bound is then equal to 0.21.
DV showed that their lower bound is decreasing in RR E1U and RR UD . By differentiating BF e it is easy to show that, for a fixed observed risk ratio RR obs ED , the lower assumption-free bound is increasing in p(E = 1) and p(D = 1|E = 0). Hence, when p(E = 1) and p(D = 1|E = 0) are relatively large, RR E1U and RR UD must be relatively small, in order for DV's lower bound to be sharper than the assumption-free lower bound. Similarly, DV's upper bound is increasing in RR E0U and RR UD , and, for a fixed observed risk ratio RR obs ED , the assumptionfree upper bound is decreasing in p(E = 0) and p(D = 1|E = 1). Hence, when p(E = 0) and p(D = 1|E = 1) are relatively large, RR E0U and RR UD must be relatively small, in order for DV's upper bound to be sharper than the assumption-free upper bound. one of the parameters RR E1U and RR UD is smaller than 6.6, but then the other parameter has to be much larger, as indicated by the steep incline of the curve through the point 6.6. Similarly, when p(D = 1|E = 1) = 0.3 and p(E = 0) = 0.8, the assumption-free upper bound is sharper than DV's upper bound when both RR E0U and RR UD are larger than 6.6. We observe from Figure 1 that, unless p(D = 1|E = 1 − e) and p(E = e) are close to 1, it takes relatively large values of RR EeU and RR UD for the assumption-free bounds to be sharper than DV's bounds. Such large values may not be realistic if U only contains a few confounders. However, as noted by Ioannidis et al. [8], large values of RR EeU and RR UD may not be unrealistic if there are many unmeasured confounders with a large aggregated effect, which can often not be ruled out in practice.

A sharp region for DV's bounds
Even though DV's bounds are not always sharp, there is a region in which they are guaranteed to be sharp. This region is established by the following theorem, which we prove in Appendix B.

Theorem 2. The lower bound in (1) is sharp if BF
The theorem implies that, if the analyst does not consider bounding factors outside the region given by the theorem as plausible, then any value of the causal risk ratio within DV's bounds should be considered as logically possible. Theorem 2 has important implications for the E-value. To illustrate how to interpret the E-value, Vander-Weele and Ding [16] considered a study of maternal breastfeeding and respiratory death by Victora et al. [18]. In this study, an exposure-outcome risk ratio equal to 3.9 was observed. Plugging this risk ratio into equation (2) gives an E-value equal to 7.26. DV concluded: 'The observed risk ratio of 3.9 could be explained away by an unmeasured confounder that was associated with both the treatment and the outcome by a risk ratio of 7.2-fold each...' Given that the E-value is derived from DV's lower bound, and given that this bound is not always sharp, one may worry that the E-value is unnecessarily pessimistic. That is, can be we confident that a null causal effect is logically possible, given that RR E1U = RR UD = 7.26?
The reassuring answer is 'yes we can'. Because RR obs ED ≤ 1/p(D = 1|E = 0), it follows from Theorem 2 that DV's lower bound is guaranteed to be sharp when BF 1 = RR obs ED , that is, when the bound is equal to 1. Hence, an important corollary of Theorem 2 is that, whenever DV's lower bound includes the null causal effect (RR true ED = 1), the null causal effect is also logically possible. In other words, the E-value is never unnecessarily pessimistic. In particular, the observed risk ratio of 3.9 in the study by Victora et al. [18] could indeed be explained away by an unmeasured confounder that was associated with both the treatment and the outcome by a risk ratio of 7.2-fold each.

Results for the causal risk difference
Ding and VanderWeele [3] and VanderWeele and Ding [16] focused on sensitivity analysis for the causal risk ratio. However, Ding and VanderWeele [3] did also provide bounds for the causal risk difference in propositions A.8 and A.10 of their eAppendix. After some algebraic rearrangement, these bounds can be expressed as where and the bounding factor BF † e is defined as All results derived in Sections 3-5 for the causal risk ratio apply with little or no modification for the causal risk difference. First, Theorem 1 applies to the sensitivity analysis parameters {RR E0U , RR E1U , RR UD } regardless of how these are used, e.g. regardless of whether these are used to bound RD true ED instead of RR true ED . Second, just like the bounds for RR true ED in (1), the bounds for RD true ED in (8) are not sharp. This can be seen by noting that, when BF 1 and BF 0 go to infinity, the lower and upper bounds in (8) go to minus infinity and infinity, respectively. Since RD true ED is confined to the range [−1, 1], the bounds thus include values that are not logically possible. Just like for the bounds in (1), we can sometimes improve the bounds in (8) by replacing them with assumption-free bounds, whenever these are sharper. Arguing as in Section 4, the assumption-free bounds for RR true ED are given by

Real data example
In a series of papers, Hammond and Horn [4][5][6] studied the association between smoking and mortality. They found an extremely strong association between smoking and lung cancer mortality (incidence rate ratio above 10), which Ding and VanderWeele [3] used in an illustration of their sensitivity analysis. We focus here on the association with total mortality, which is more moderate and thus provides a more realistic and general illustration. We provide R code for the analysis in Appendix D.
Hammond and Horn stratified their analyses on age; we re-analyze the data from their oldest stratum, comprising 29105 subjects who were between 65 and 69 years old at baseline. Among these, 6287 reported that they had never smoked. We consider these 6287 subjects as unexposed (E = 0) and the remaining 22818 subjects as exposed (E = 1). During a follow-up period of 44 months, the number of subjects who died (D = 1) were 613 and 2837 among the unexposed and exposed, respectively.    computing DV's lower bound, since this bound would then include values of RR true ED that are ruled out by the assumption-free lower bound. In other words; to the right of the rightmost curve DV's bound is not sharp. The dashed curve indicates where BF 1 = 1/p(D = 1|E = 0); everywhere to the left of this curve DV's lower bound is guaranteed to be sharp by Theorem 2.
Even though the sample size in this example is quite large, the quantities p(E = 1), p(D = 1|E = 1), p(D = 1|E = 0) and RR obs ED are still estimated, and a proper sensitivity analysis needs to account for the sampling variability in the estimates. This is complicated by the fact that the desired lower bound is the maximal value of the DV's lower bound and the assumption-free lower bound. Cai et al. [2] derived analytic variance estimates for bounds that are on such a 'min/max' form, in the context of mediation analysis. Estimators on a 'min/max' form also occur in the optimization of dynamic treatment regimes [e.g. 10,11,15]; see also Hirano and Porter [7] for a more general discussion of this problem. Since our focus is on large-sample properties of the bounds, we here illustrate a simpler solution based on nonparametric bootstrap resampling. The solid curve in Figure 3 shows the estimated maximal value of DV's lower bound and the assumption free bound, as a function of RR E1U and RR UD when these are assumed equal, i.e. along a diagonal line with slope 1 through the origo in Figure 2. Beyond RR E1U = RR UD = 20.6 the curve is flat, since the estimated maximal value is equal to the assumption-free lower bound, which does not depend on RR E1U and RR UD . The dashed lower and upper curves indicate point-wise 95% lower and upper confidence limits, respectively. These were obtained by drawing 1000 bootstrap replicates from the sample, estimating the maximal bound for each sample, and using the 0.025 and 0.975 quantiles as confidence limits, at each value of RR E1U = RR UD . We observe that the confidence intervals are fairly narrow, as expected.
We argued in Section 4 that DV's bounds may often, but not always, be sharper than the assumption-free bounds when there are only a few confounders. In this section we explore this issue further with a simulation study. We provide R code for the simulation in Appendix E.
In the simulation we assumed a single binary confounder U, and generated distributions p(D, E, U) from the model where expit(x) = 1/(1+e −x ) is inverse logit function. The parameters {β, δ, ψ} were assumed to be independent random variables, distributed as N(0, σ 2 ). The standard deviation σ determines the probability of extreme causal effects and confounding effects. For instance, with σ = 1 and σ = 3, the magnitude of these effects would be larger than 3, on the log odds ratio scale, with 0.27% and 31.7% probability, respectively.  p(D, E, U). For the lower and upper bounds separately, we computed the proportion p of times that the assumption-free bound was sharper than DV's bound. We also computed the mean (over the 1000 distributions) absolute distance between the log of the bound and the log of the true causal risk ratio, which we denote with Δ for DV's bounds and Δ for the assumption-free bounds.
Tables 1 and 2 display the results, for σ = 1 and σ = 3, respectively. For σ = 1, the assumption-free bounds were never sharper than DV's bounds; p = 0.00 for all scenarios. The assumption-free bounds were on average much more conservative than DV's bounds; Δ and Δ were within the ranges (2.22,3.68) and (0.12,0.15) for the lower bounds, respectively, and within the ranges (1.71,3.10) and (0.13,0.17) for the upper bounds, respectively. For σ = 3 we observe a similar pattern, however not as extreme. DV's bounds were usually sharper than the assumption-free bounds, but not always; p was within the range (0,0.05) for the lower bounds and within the range (0.05,0.25) for the upper bounds. Δ and Δ were within the ranges (2.35,3.93) and (0.49,0.59) for the lower bounds, respectively, and within the ranges (2.13,3.57) and (0.70,0.82) for the upper bounds, respectively. Table 1: Simulation results with σ = 1. p is the proportion of times that the assumption-free bound was sharper than DV's bound; Δ and Δ are the mean absolute distance between the log of the bound and the log of the true causal risk ratio, for DV's bounds and the assumption-free bounds, respectively. lower bound upper bound  It is not surprising that the advantage of DV's bounds become less pronounced with increasing σ. When extreme confounding effects become more likely, the sensitivity analysis parameters RR EeU and RR UD tend to become more extreme as well, and it thus happens more often that DV's bounds include values of the causal risk ratio that are logically ruled out by the observed data distribution.
In practice, the true values of {RR E0U , RR E1U , RR UD } would rarely be known by the analyst. Hence, by using the true values of these parameters in the computation of DV's bounds, the simulation may give an unrealistic advantage of DV's bounds over the assumption-free bounds. To enable a more realistic comparison we repeated the simulation, using values of {RR E0U , RR E1U , RR UD } that were 15% larger than the true values. We present the results of these simulations in Appendix F. In summary, we observed that DV's bounds were slightly more conservative than before, but with no dramatic deterioration. This indicates that DV's bounds are not overly sensitive to a conservative guess of the sensitivity analysis parameters.

Conclusion
In this paper we have derived and clarified several important points related to the sensitivity analysis proposed by DV. We have shown that DV's sensitivity analysis parameters are confined to values equal to or above 1, and we have shown that the parameters are variation independent of each other and of the observed data distribution (Theorem 1). We have further shown that DV's bounds for the causal risk ratio are not always sharp, and we have established a region when they are guaranteed to be sharp (Theorem 2). We have finally shown that this region includes the E-value, which implies that the E-value is not overly pessimistic.
A simple way to improve DV's bounds is to replace them with the assumption-free bounds whenever the latter are narrower. We have argued, and indicated by simulations, that this may rarely happen when there are only a few unmeasured confounders. In practice though, there may often be many unmeasured confounders with a large aggregated confounding effect, which may force DV's bounds to extremes. In such scenarios, the assumption-free bounds may provide a useful alternative.
We end with a technical remark. Theorem 2 shows that DV's bounds are sharp whenever the bounding factors {BF 0 , BF 1 } are within a certain region. However, the theorem does not show the converse, i.e. that the bounding factors are within this region whenever DV's bounds are sharp. In fact, we conjecture that DV's bounds may be sharp for some bounding factors outside this region as well. We recognize the proof (or disproof) of this conjecture as an important topic for future research.

B Proof of Theorem 2
To show that the lower bound in (1) is sharp when BF 1 ≤ 1/p(D = 1|E = 0), we show that it is possible to construct a distribution p(D, E, U) that marginalizes to any given set {RR * E1U , RR * UD , p * (D, E)} for which BF * 1 ≤ 1/p * (D = 1|E = 0) and RR true ED = RR obs ED /BF 1 . We construct the distribution p(D, E, U) in the following steps.

C Proof that Theorem 2 carries over to the causal risk difference
To show that the lower bound in (8) is sharp when BF 1 ≤ 1/p(D = 1|E = 0), we show that it is possible to construct a distribution p(D, E, U) that marginalizes to any given set {RR * E1U , RR * UD , p * (D, E)} for which BF * 1 ≤ 1/p * (D = 1|E = 0) and RD true ED = RD obs ED −BF † 1 . We use the same distribution as in the proof of Theorem 2 for the lower bound in (1), in Appendix B. It thus remains to show that, for this distribution, RD true ED = RD obs ED − BF † 1 . We have that The last expression is identical to the lower bound for RD true ED given in proposition A.8 in the eAppendix of [3], which is algebraically equivalent with RD obs ED − BF † 1 . To show that the upper bound in (8) is sharp when BF 0 ≤ 1/p(D = 1|E = 1), we show that it is possible to construct a distribution p(D, E, U) that marginalizes to any given set {RR * E0U , RR * UD , p * (D, E)} for which BF * 0 ≤ 1/p * (D = 1|E = 1) and RD true ED = RD obs ED + BF † 0 . We use the same distribution as in the proof of Theorem 2 for the upper bound in (1), Appendix B. It thus remains to show that, for this distribution, RD true ED = RD obs ED + BF † 0 . We have that The last expression is identical to the upper bound for RD true ED given in proposition A.10 in the eAppendix of [3], which is algebraically equivalent with RD obs ED + BF † 0 .

F Additional simulation results
The setup for these simulations is identical to the simulation setup in the main text, with the exception that DV's bounds were computed here using values of {RR E0U , RR E1U , RR UD } that were 15% larger than the true values. The results are displayed in Tables F3 and F4.   Table F3: Simulation results with σ = 1. p is the proportion of times that the assumption-free bound was sharper than DV's bound; Δ and Δ are the mean absolute distance between the log of the bound and the log of the true causal risk ratio, for DV's bounds and the assumption-free bounds, respectively.  Table F4: Simulation results with σ = 3. p is the proportion of times that the assumption-free bound was sharper than DV's bound; Δ and Δ are the mean absolute distance between the log of the bound and the log of the true causal risk ratio, for DV's bounds and the assumption-free bounds, respectively.