Novel bounds for causal e ﬀ ects based on sensitivity parameters on the risk di ﬀ erence scale

: Unmeasured confounding is an important threat to the validity of observational studies. A common way to deal with unmeasured confounding is to compute bounds for the causal e ﬀ ect of interest, that is, a range of values that is guaranteed to include the true e ﬀ ect, given the observed data. Recently, bounds have been proposed that are based on sensitivity parameters, which quantify the degree of unmea sured confounding on the risk ratio scale. These bounds can be used to compute an E - value, that is, the degree of confounding required to explain away an observed association, on the risk ratio scale. We complement and extend this previous work by deriving analogous bounds, based on sensitivity parameters on the risk di ﬀ erence scale. We show that our bounds can also be used to compute an E - value, on the risk di ﬀ erence scale. We compare our novel bounds with previous bounds through a real data example and a simulation study.


Introduction
The estimation of causal effects in observational (non-randomized) studies is often hampered by unmeasured confounding. A common way to deal with unmeasured confounding is to compute bounds for the causal effect of interest, that is, a range of values that is guaranteed to include the true effect, given the observed data. Such bounds have, for instance, been derived for causal effects in randomized trials with non-compliance [1], causal effects in the presence of truncation by death [2], controlled and natural direct effects [3,4] and causal interactions [5,6].
A common feature of these bounds is that they are typically "assumption-free," in the sense that they make no parametric assumptions about the relations between observed and unobserved variables. As a consequence, the bounds are often relatively wide and may thus not be very informative. In contrast, Ding and VanderWeele (DV) [7] proposed to parametrize the degree of unmeasured confounding, using two sensitivity parameters that quantify the maximal strength of association between the exposure and the confounders, and between the outcome and the confounders. DV derived bounds for the causal exposure effect, as functions of the sensitivity parameters and the observed data distribution. By using subject matter knowledge to set the sensitivity parameters to plausible values, bounds are obtained that can be substantially narrower than the assumption-free bounds. Many other sensitivity analyses have been proposed in  the literature, but these typically require rather special conditions, such as a single binary confounder [e.g., refs 8-10] or no exposure-confounder interaction [e.g., refs 11-13]; we refer to ref. [7] for a thorough review.
The sensitivity parameters proposed by DV are defined on the risk ratio scale. However, if the target causal effect is a risk difference, then it may be natural to specify the sensitivity parameters on the risk difference scale as well. In their eAppendix, DV [7] provided bounds for the causal risk difference using sensitivity parameters on the risk difference scale. However, these bounds are restricted to a categorical confounder with a known number of levels. This is an important limitation, since in practice one would almost never know the dimension of the unknown confounders, and these may also contain a combination of categorical and continuous variables. Furthermore, some subject matter experts may find it more intuitive to speculate about the degree of unmeasured confounding on the risk difference scale, regardless of the chosen scale for the target causal effect.
In this article we address these limitations. We derive bounds for causal effects that can be written as a contrast between the counterfactual probability of the outcome if the exposure is present and absent, respectively, for everybody in the population. The bounds that we derive are functions of sensitivity parameters that quantify the degree of unmeasured confounding on the risk difference scale.
The article is organized as follows. In Section 2, we establish basic notation, definitions and assumptions and define the target causal effect. In Section 3, we derive assumption-free bounds in our setting; these serve as a benchmark to which other bounds can be compared. In Section 4, we briefly review the key results of ref. [7], and in Section 5 we present our novel bounds. Both DV's and our bounds are functions of several sensitivity parameters. In Section 6 we discuss how the parameter space can be reduced, to facilitate interpretation and communication. In Section 7, we illustrate the theory with a real data example on smoking and mortality, and in Section 8, we compare DV's and our bounds in a small simulation study. We provide concluding remarks in Section 9.

Notation, definitions and assumptions
We adopt the notation of ref. [7], with some modifications. Let E and D denote the binary exposure and outcome, respectively. Let , the observed data distribution ( ) p D E , consists of three free parameters: p 1 , q 0 and q 1 . The statistical association between E and D is defined as some contrast between q 1 and q 0 , e.g., where g is a monotone link function. The identity, log and logit links give the risk difference, log risk ratio and log odds ratio, respectively. Let ( ) D e be the potential outcome for a given subject, with the exposure been set to level = E e for that subject [14,15]. The potential outcome ( ) D e is connected to the observed outcome D through the relation which is often referred to as "consistency" [15]. Let { ( ) } = = * p D e q 1 e be the counterfactual probability of the outcome, with the exposure been set to = E e for everyone [14,15]. The target parameter is the causal effect of the exposure, which is defined as some contrast between * q 1 and * q 0 , e.g., Generally, ≠ * ψ ψ due to unmeasured confounding. Let U denote a set of unmeasured confounders, which (together with the measured covariates C) is assumed to be sufficient for confounding control. Formally, we assume that which is often referred to as "conditional (given U ) exchangeability" [15]. In terms of the joint distribution ( ) p D E U , , , the parameter * q e is given by where the first equality follows from the law of total probability, the second from conditional exchangeability (3) and the third from consistency (1).
3 Assumption-free bounds Manski [16] and Robins [17] derived assumption-free bounds for causal effects as follows. Using the law of total probability we can decompose * q e into where the second equality follows from consistency (1), cf. also ref. [18]. In this expression only is unknown. Setting this counterfactual probability to its lower and upper limits, 0 and 1, gives e e e e e e 1

(6)
These bounds are assumption-free in the sense that they are valid regardless of the relations between U and ( ) D E , ; we refer to these bounds as the "AF bounds." Replacing * q 0 and * q 1 in (2) with their upper and lower bounds, respectively, gives a lower bound for * ψ . Similarly, replacing * q 0 and * q 1 in (2) with their lower and upper bounds, respectively, gives an upper bound for * ψ . For instance, the implied AF bounds for the causal The width of the AF bounds for the causal risk difference is constant and equal to regardless of the observed data distribution. This is not the case for other effect measures, e.g., the causal risk ratio.

DV's bounds
We notify the reader that several of the results reviewed below were not presented in the main text of ref. [7], but in their eAppendix. Some of the results were also not stated explicitly, but follow more or less implicitly from the derivations and arguments in that eAppendix. DV proposed to compute bounds for * q 0 , * q 1 and * ψ by specifying sensitivity parameters that quantify the maximal strength of association between E and U , and between U and D, respectively. Formally, these parameters are defined as EUe u (7) and UDe u u Sjölander [19] showed that, given p 1 , q 0 and q 1 , the parameters RR EUe and RR UDe may take any value in the range [ ) ∞ 1, . Define EUe UDe (8) Ding and VanderWeele [7] showed (Section 5 of their eAppendix) that, given RR EUe and RR UDe , * q e is bounded by e e e ee e e e e e e 1 1 1

(9)
We refer to the bounds in (9) as the "DV bounds." By providing guesses of RR EUe and RR UDe for { } ∈ e 0, 1 , the analyst can use the relation in (9) to compute bounds for * q 0 and * q 1 . Notably, the bounds in (9) are not sharp. To see this, note that the upper bound is monotonically increasing in (   and the upper bound for * q e approach infinity as well. However, since * q e is a probability it is logically bounded above by 1. In fact, Sjölander [19] showed that the DV bounds in (9) may occasionally be wider than the AF bounds in (6), even when the DV bounds do not exceed the a priori possible range [ ] 0, 1 . As noted by Sjölander [19], these problems can easily be solved by replacing the DV bounds with the AF bounds whenever the latter are narrower. We thus obtain the modified bounds which we refer to as the "DVS bounds." As before, replacing * q 0 and * q 1 in (2) with their upper and lower bounds, respectively, gives a lower bound for * ψ . Similarly, replacing * q 0 and * q 1 in (2) with their lower and upper bounds, respectively, gives an upper bound for * ψ . For instance, the implied DVS bounds for the causal risk difference − * * q q In many situations, one may observe a positive association between the exposure and the outcome ( > q q 1 0 ), and one may wonder how much confounding is required to fully "explain away" this association. VanderWeele and Ding [20] labeled this degree of confounding as the "E-value." Formally, VanderWeele and Ding [20] defined the E-value as the smallest common value = = RR RR RR EU UD UD 1 0 1 such that the lower bound for * q 1 in (9) is equal to the upper bound for * q 0 in (9). They showed that the E-value thus defined is equal to [7] also derived bounds that use sensitivity parameters on the risk difference scale (Section 6 of their eAppendix). However, these bounds are restricted to the causal risk difference, and they require that the confounder U is categorical with a known number of levels. In the special case when U is binary, the bounds are given by

Ding and VanderWeele
where the sensitivity parameters RD EU binary and RD UDe binary are defined as We refer to the bounds in (11) and (12) as the "binary DV" bounds.

Novel bounds based on sensitivity parameters on the risk difference scale
Similar to DV we propose to compute bounds for * q 0 , * q 1 and * ψ by specifying sensitivity parameters that quantify the maximal strength of association between E and U and between U and D. However, our parameters measure these associations on the risk difference scale rather than the risk ratio scale, and our bounds are not restricted to the causal risk difference and do not require the confounder U to be categorical with a known number of levels. We define , the larger the potential for a strong association between E and U . In this sense, RD EU measures the "maximal association" between E and U , on the risk difference scale. In a similar sense, RD UDe measures the maximal conditional association between D and U , given E.
The parameter RD UDe is an obvious analogue to RR UDe and a natural generalization of RD UDe binary . However, the parameter RD EU differs from RR EUe and RD EU binary in that it compares conditional probabilities on the form p U E . This modification is mainly for technical reasons, to make the calculations of bounds feasible. However, specifying (functions of) ( | ) p E U may also be easier for a subject matter expert than specifying (functions of) ( | ) p U E , since the direction of causality goes from U to E, not the other way around. We note that RD EU is not a function of e, since By definition, RD EU and RD UDe are restricted to the range [ ] 0, 1 . However, when speculating about plausible values for these parameters it is also important to know whether the observed data further restricts the range of possible parameter values, or whether the parameters restrict each other. As a simple example of how such restrictions may arise, consider the multinominal distribution. Even though the individual cell probabilities in a multinomial distribution can be anywhere between 0 and 1, they cannot simultaneously be anywhere between 0 and 1, since they sum to 1. For instance, knowing that one cell probability is equal to 0.9 restricts the other cell probabilities to the range [0, 0.1]. In contrast, the mean and the variance in the normal distribution do not impose any such restrictions on each other. The following theorem, which we prove in Appendix A, establishes that the observed data distribution does not restrict the sensitivity parameters, and that the sensitivity parameters do not restrict each other; this property is referred to as "variation independence" [21,22]. Theorem 1. RD EU , RD UD0 and RD UD1 are variation independent of each other, and of the observed data distribution parameters p 1 , q 0 and q 1 .
The following theorem, which we prove in Appendix B, shows how our proposed sensitivity parameters can be used to construct bounds for * q e .
We refer to the bounds in (13) as the SH bounds. Minimizing and maximizing the expressions in (13) e is a constrained optimization problem, which can be solved with standard software, e.g., the optim function in R. However, since unconstrained optimization is computationally simpler than constrained optimization, it may be preferable to transform ( ) b d , e e into unconstrained parameters. One example of such a transformation is given by , expit is the inverse of the logit function, and the limits ( , max e e are given in Theorem 2. As before, replacing * q 0 and * q 1 in (2) with their upper and lower bounds, respectively, gives a lower bound for * ψ . Similarly, replacing * q 0 and * q 1 in (2) with their lower and upper bounds, respectively, gives an upper bound for * ψ . For instance, the implied SH bounds for the causal risk difference − * * q q  When setting the sensitivity parameters RD EU and RD UDe to the extreme value 0 (minimal confounding), the upper and lower limits of b e and d e collapse into 0, so that a e and c e become equal to 0 as well. Hence, both the lower and upper bounds in (13) become equal to q e , so that the statistical association ψ becomes equal to the causal effect * ψ , as expected. At the other extreme, when setting RD EU and RD UDe to 1 (maximal confounding), the lower and upper limits of b e collapse into − q 1 e and the lower and upper limits of d e collapse into − p 1 e , so that a e and c e become equal to −q e and −p e , respectively. After a little algebra the bounds in (13) become equal to the AF bounds in (6). This is also expected, since the AF bounds can never be exceeded, regardless of the amount of confounding.
Using the bounds in (13) we may define the E-value analogously to ref. [20]. Specifically, for a positive association between E and D ( > q q 1 0 ) we define the E-value as the smallest common value = RD EU = RD RD UD UD 0 1 such that the lower bound for * q 1 in (13) is equal to the upper bound for * q 0 in (13). This E-value does not have a simple analytic expression, but it can easily be found numerically.
Both the DVS bounds, the binary DV bounds and the SH bounds require specification of certain sensitivity parameters, intended to quantify the degree of unmeasured confounding. Specifying such parameters becomes challenging, even for a subject matter expert, if the number of parameters is large. Furthermore, one may often want to vary the sensitivity parameters over plausible ranges, and present the bounds as functions of the parameters within these ranges, to convey the sensitivity of the observed associations with various degrees of confounding. In order for such sensitivity analysis to be feasible and transparent, the number of sensitivity parameters has to be small.
Computing the DVS bounds for both * q 0 and * q 1 requires specification of four sensitivity parameters (RR EU0 , RR EU1 , RR UD0 and RR UD1 ), whereas the binary DV bounds and the SH bounds require only three sensitivity parameters (RD EU binary , RD UD0 binary and RD UD1 binary for the binary DV bounds, and RD EU , RD UD0 and RD UD1 for the SH bounds). This may be viewed as a relative advantage of the binary DV bounds and the SH bounds. However, even three sensitivity parameters may be awkward to handle from a practical perspective. To further reduce the dimensionality of the sensitivity parameter space one may replace RR EU0 and RR EU1 with In fact, even though Ding and VanderWeele [7] used the parameters RR UD0 , RR UD1 , RD UD0 binary and RD UD1 binary in their derivations (e.g., in Sections 2.2 and 6 of their eAppendix), they replaced these with RR UD and RD UD binary when presenting their final result. The replacements above reduce the number of sensitivity parameters to two, both for the DVS bounds, the binary DV bounds and the SH bounds. It is easy to show that the bounds remain valid under these replacements, but that they may become wider.

Real data example
We use the same example as in ref. [19] to illustrate the theory. The data for this example are borrowed from the studies of Hammond and Horn [23][24][25], who studied the association between smoking and mortality. These authors carried out several different analyses; in particular, they found an extremely strong association between smoking and lung cancer. We focus here on the association between smoking and total mortality, which is more moderate. We provide R code for the analysis in Appendix C.
Hammond and Horn stratified all their analyses on age. We here re-analyze the data from their oldest stratum, which consists of 29,105 subjects who were between 65 and 69 years old at enrollment into the study. Among these, 6,287 subjects reported that they had never smoked. We consider these 6,287 subjects as being unexposed ( = E 0) and the remaining 22,818 subjects as being exposed ( = E 1). During a total follow-up period of 44 months, the number of subjects who died ( = D 1) were 613 and 2,837 among the unexposed and exposed subjects, respectively. ; hence, smoking appears to elevate the risk of death during follow-up with 2.7 percentage points. However, this statistical association is only adjusted for (e.g., stratified by) age and may thus be partly or fully explained by unmeasured confounding. To appreciate the maximum possible impact of confounding, we compute the AF bounds for the causal risk difference − * * q q 1 0 , which are equal to ( ) −0.708, 0.292 . Hence, without making any assumptions about the degree of unmeasured confounding, the causal risk difference can be as small as −0.708 and as large as 0.292.
The AF bounds are very wide. To narrow down we may consider a range of plausible values for the degree of unmeasured confounding. The contour plots in the top row of Figure 1  The contour plots in the bottom row of Figure 1 illustrate the SH lower (left panel) and upper (right panel) bounds for the causal risk difference, as functions of RD EU and RD UD . The E-value, in terms of RD EU and RD UD , is equal to 0.13; up to this degree of unmeasured confounding the observed data imply the presence of a positive causal effect.

Simulation
We carried out a small simulation study to compare the AF, DVS, binary DV and SH bounds. We considered a categorical confounder U with K levels. We generated distributions ( ) p D E U , , from the model ( ) Technically, ( ) = p U u has a Dirichlet distribution with K parameters all equal to 1. In this way, the distribution { ( 1 is drawn uniformly from the K -dimensional unit simplex [26]. We initially considered a binary confounder, i.e., = K 2. In this special case, ( 1 are drawn uniformly over the interval ( ) 0, 1 . We generated 10,000 distributions ( ) p D E U , , . For each generated distribution we computed the AF, DVS, binary DV and SH bounds for the causal risk difference − * * q q 1 0 , and the AF, DVS and SH bounds for the causal risk ratio / * * q q 1 0 . We used the true values of all sensitivity parameters in the computation of the bounds. The simulation was carried out twice, without and with the parameter reduction described in Section 6.
The top row of Figure 2 shows the width distribution of the bounds for the causal risk difference (left panel) and risk ratio (right panel), without parameter reduction. We observe that the DVS, binary DV and SH bounds are generally much narrower than the AF bounds. We further observe that the width distributions of the DVS and SH bounds are quite similar. Thus, with correct specification of their respective sensitivity parameters, the DVS and SH bounds seem to be equally informative. The binary DV bounds are generally narrower than both the DVS bounds and the SH bounds for the causal risk difference. This is not surprising, given that the binary DV bounds use the strong additional information/assumption that U is binary, and are only valid under this assumption.
The bottom row of Figure 2 shows the results with parameter reduction. We observe that the DVS, binary DV and SH bounds are now wider, but still much narrower than the AF bounds, in general. We further observe that the SH bounds tend to be narrower than the DVS bounds. This is expected, since the parameter reduction eliminates two sensitivity parameters from the DVS bounds, but only one sensitivity parameter from the SH bounds. Thus, one would expect that the parameter reduction generally has a stronger influence on the DVS bounds than on the SH bounds. We finally repeated the simulation for each K in … 2, , 10. For this part of the simulation we omitted the binary DV bounds, since these only apply when the confounder U is binary. Figure 3 shows the median width of the AF bounds (solid lines), DVS bounds (dashed lines) and SH bounds (dotted lines) for the causal risk difference (left panel) and the causal risk ratio (right panel), when not using the parameter reduction (top row) and using the parameter reduction (bottom row). We observe that the width of the AF bounds is constant for the causal risk difference (=1) and appears fairly constant for the causal risk ratio as the number of confounder levels increases. However, the width of the DVS and SH bounds tends to increase with the number of confounder levels, both for the causal risk difference and risk ratio. This is not surprising, given that the sensitivity parameters for these bounds are defined by contrasting the confounder levels that are most extreme, in the sense that these levels maximize/minimize the exposure and outcome prevalences. The more levels there are, the larger the discrepancy between the most extreme values (in a probabilistic sense), and the larger the sensitivity parameters and the wider the bounds.

Conclusions
In this article, we have derived bounds for causal effects, using sensitivity parameters that quantify the degree of unmeasured confounding on the risk difference scale. These bounds can subsequently be used to compute an "E-value," that is, a minimal amount of confounding required to explain away an observed statistical association. Thus, our work complements and extends the work of Ding and VanderWeele [7], who derived similar bounds using sensitivity parameters on the risk ratio scale.
Our work has important practical implications. When the target causal effect is a risk difference, it may be more natural to use sensitivity parameters on the risk difference scale than on the risk ratio scale. Furthermore, some subject matter experts may find it more intuitive to speculate about the degree of unmeasured confounding on the risk difference scale, regardless of the chosen scale for the target causal effect. Novel bounds for causal effects based on sensitivity parameters on the risk difference scale  199 Our proposed bounds are functions of the observed data distribution parameters ( ) p q q , , 1 0 1 . In practice, these are not known but have to be estimated from data, which gives estimated bounds. An important extension would thus be to develop methods that take the sampling variability of the estimated bounds into account. This could either be done with analytic arguments, as in ref. [3], or with bootstrap simulation, as in refs [5,6]. 3.
1, ,6 . We also have that 1, ,6 . We also have that This distribution is valid, since 1, ,6 . We also have that From (4) we have that so minimizing/maximizing * q e is equivalent to minimizing/maximizing ( ) E Δ eU . We have that where the second equality follows from (14). We also have that where the first equality follows from (15) and the third from Bayes' rule. We now have that where the first equality follows from (4) and (20), and the third from (19). Thus, minimizing/maximizing ( ) E Δ eU is equivalent with maximizing/minimizing the covariance ( ) δ cov Δ ,   it follows from Theorem 1 of [27], and (19), that Consider first the constraint in (22). Let x 1 and x 2 be the solutions to the equations  (16) and (17)