# The Mechanics of Omitted Variable Bias: Bias Amplification and Cancellation of Offsetting Biases

Peter M. Steiner and Yongnam Kim

# Abstract

Causal inference with observational data frequently requires researchers to estimate treatment effects conditional on a set of observed covariates, hoping that they remove or at least reduce the confounding bias. Using a simple linear (regression) setting with two confounders – one observed (X), the other unobserved (U) – we demonstrate that conditioning on the observed confounder X does not necessarily imply that the confounding bias decreases, even if X is highly correlated with U. That is, adjusting for X may increase instead of reduce the omitted variable bias (OVB). Two phenomena can cause an increasing OVB: (i) bias amplification and (ii) cancellation of offsetting biases. Bias amplification occurs because conditioning on X amplifies any remaining bias due to the omitted confounder U. Cancellation of offsetting biases is an issue whenever X and U induce biases in opposite directions such that they perfectly or partially offset each other, in which case adjusting for X inadvertently cancels the bias-offsetting effect. In this article we discuss the conditions under which adjusting for X increases OVB, and demonstrate that conditioning on X increases the imbalance in U, which turns U into an even stronger confounder. We also show that conditioning on an unreliably measured confounder can remove more bias than the corresponding reliable measure. Practical implications for causal inference will be discussed.

## Introduction

Causal inference with observational studies frequently requires researchers to estimate treatment effects conditional on a set of observed baseline covariates in order to remove confounding bias. Covariate-adjusted effect estimates can be obtained by controlling for the observed covariates in a regression analysis, or by matching cases on the observed covariates or the corresponding propensity score. It is well known that the confounding bias can be removed if all the confounding covariates that simultaneously determine treatment selection and the outcome are observed. This condition is frequently referred to as the conditional independence assumption, selection on observables, strong ignorability assumption, unconfoundedness, or the backdoor or adjustment criterion [1, 2, 34]. If one fails to reliably measure all the confounding covariates, the causal effect is not identified and the covariate-adjusted treatment effect will usually remain biased. In the linear regression context, the bias due an omitted variable is formalized in the omitted variable bias (OVB) formula [2, 5, 6, 7]. [1]

Though OVB is well known and has been discussed for decades, the mechanics of OVB are not yet fully understood which regularly leads to misguided advice regarding the reduction of confounding bias in practice. Applied and methodological articles and textbooks regularly suggest that including more variables in a regression model will more likely establish the conditional independence assumption and thus reduce or at least not increase confounding bias (e. g., [8, 9, 10], see [7], for a brief discussion of this ill-advised rationale for including more rather than less covariates). Similarly, there is a strong belief that adjusting for an observed variable that is correlated with unobserved confounders necessarily removes a part of the bias induced by the unobserved confounders and, thus, further reduces bias. Particularly the matching literature suggests that matching on variables that are correlated with unobserved confounders reduces the imbalance in and the bias due to unobserved confounders (e. g., [11, 12, 13]). We will show that even a high correlation neither guarantees a decrease in imbalance in the unobserved confounders nor a decreasing bias. We will also show that measurement error in covariates (unreliability) does not imply that less bias is removed.

Recently, researchers started looking at the mechanics of OVB in more detail. In particular, they have been investigating what happens if one conditions on covariates that have the potential to induce or amplify bias. Such covariates are collider variables that induce their own bias in addition to any OVB [14, 15, 16], or instrumental variables (IVs) that amplify any bias left after conditioning on a set of observed covariates [17, 18]. Another class of bias-amplifying covariates are near-IVs that strongly determine treatment selection but affect the outcome only weakly (the weak instead of absent association with the outcome turns them into a near-IV). Pearl [17, 19], see also [20, 21], formally showed that adjusting for a near-IV removes the near-IV’s own confounding bias but also amplifies any bias left due to omitted confounders. Also simulation studies have been used to demonstrate that the inclusion of additional variables can actually increase OVB [7, 21, 22].

In this article we give a thorough formal characterization of the mechanics that lead to OVB. In particular, we discuss conditions under which adjusting for a confounder actually increases instead of reduces OVB. We use a linear setting with only two continuous confounders, X and U, that confound the relationship between a continuous treatment Z and a continuous outcome variable Y. This allows us to keep the complexity of the OVB formulas low, and thus to better understand the OVB mechanics.

In the following we first review and explain the phenomenon of bias amplification when one conditions on an IV in the presence of an omitted variable. Then we focus on the case of two uncorrelated confounders (one observed, the other unobserved), followed by the more general case with two correlated confounders. Slowly increasing the complexity of the confounding structure – from the IV case to two correlated confounders – allows us to clearly disentangle the effects of bias amplification, cancellation of offsetting biases, correlated confounders, and unreliable covariate measurement. We conclude with a discussion of practical implications. The appendices contain (a) an explanation of bias amplification in the context of matching or stratifying on an IV (Appendix A), (b) OVB formulas for a dichotomous treatment variable (Appendix B), and (c) proofs of results discussed in this article (Appendix C).

## Amplification of bias and imbalance: the instrumental variable case

Several publications [17, 18, 19, 20] demonstrated that conditioning on an instrumental variable (IV) amplifies any remaining bias due to an omitted variable. [2] The causal graph in Figure 1 represents a simple data generating model (DGM) for the outcome Y and treatment Z with one confounder U and an instrumental variable IV (which is a variable that has no effect on the outcome Y except for the indirect effect via treatment Z). The corresponding linear structural causal model (SCM) is given by

I V = ε I V , U = ε U , Z = α I V I V + α U U + ε Z , Y = τ Z + β U U + ε Y ,

### Figure 1:

Causal graph with an instrumental variable (IV). Z is the treatment, Y the outcome, and U an unobserved confounder (represented by the vacant node).

where α U , β U , and τ are standardized parameters and ε I V , ε U , ε Z , and ε Y are mutually independent error terms (representing unknown factors or measurement error) with variances that ensure that

V a r I V = V a r U = V a r Z = V a r Y = 1.

Conducting a linear regression analysis that neither conditions on U nor on IV, Y ˆ = γ ˆ + τ ˆ Z , results in a biased regression estimator τ ˆ for the treatment effect with E ( τ ˆ ) = τ + α U β U . Thus, the initial OVB, that is, the bias before conditioning on IV, is given by O V B ( τ ˆ | { } ) = E ( τ ˆ ) τ = α U β U . The empty set in O V B ( τ ˆ | { } ) indicates that we did not adjust for any covariates. Note that the initial OVB, α U β U , represents the confounding bias due to the unblocked (open) backdoor path ZUY. [3]

### Bias amplification

Omitting U but including IV in the regression model, Y ˆ = γ ˆ + τ ˆ Z + α ˆ I V I V , also results in bias [17]:

(1) O V B ( τ ˆ | I V ) = α U β U 1 α I V 2 .

However, conditioning on IV amplifies any bias left due to an unblocked backdoor path because 0 < 1 α I V 2 < 1 . Thus, the absolute OVB after adjusting for IV is always greater than the absolute initial OVB: α U β U 1 α I V 2 > α U β U . If we were to condition on U in addition to IV (in case U would be observed), no OVB would be left because U blocks the backdoor path ZUY. Thus, if all confounders (or at least a set of variables that blocks all backdoor paths) are reliably measured, conditioning on an IV does not result in any OVB because there is no bias left to be amplified (provided the functional form of the regression is correctly specified). However, adjusting for the IV still reduces the efficiency of the treatment effect estimate [21, 23].

### Imbalance in the unobserved confounder U

Bias amplification occurs because conditioning on the IV increases the imbalance in the unobserved confounder U. For our linear framework, we define imbalance as the difference in the expected value of U for subpopulations with Z = z and Z = z + 1 (if Z would be dichotomous the imbalance would measure the mean difference between the two groups). That is, without conditioning on the IV or any other covariate the imbalance in U is obtained by regressing U on Z: I m b a l a n c e ( U | { } ) = E ( U | Z = z + 1 ) E ( U | Z = z ) = α U . After conditioning on IV, we get I m b a l a n c e ( U | I V ) = E ( U | Z = z + 1 , I V ) E ( U | Z = z , I V ) = α U 1 α I V 2 (Proof 1 in Appendix C). The comparison of the two imbalance formulas reveals that conditioning on the IV amplifies U’s imbalance by the factor 1 / ( 1 α I V 2 ) . Thus, we can write the OVB as the product of the amplified imbalance in U and U’s direct effect on the outcome: O V B ( τ ˆ | I V ) = α U 1 α I V 2 × β U . This formula highlights that conditioning on IV turns U into a relatively stronger confounder.

The increased imbalance in U can be explained as follows (similar explanations can be found in [21], and [24]): Since Z = α I V I V + α U U + ε Z is a function of IV, U, and the error term ε Z , conditioning on the IV removes IV’s effect on Z such that the remaining variation in Z is determined by U and the error term alone. With only two sources of variation left (U and ε Z ), U now explains a larger portion of variance in Z. Hence, the association between U and Z for a given IV = v is necessarily greater than before conditioning on IV. The increased association between U and Z implies an increase in U’s absolute imbalance: I m b a l a n c e ( U | I V ) = α U 1 α I V 2 > I m b a l a n c e ( U | { } ) = α U . Appendix A contains a more intuitive explanation within the context of matching or stratifying treatment and control cases on an IV.

## OVB and imbalance due to conditioning on an uncorrelated confounder

Bias-amplification occurs not only when one conditions on an IV but also when one conditions on a confounder. For an unobserved confounder U and an uncorrelated confounder X that both induce bias in the same direction (i. e., either positive or negative selection bias), prior studies have shown that conditioning on a confounder, where X is a near-IV that is highly predictive of treatment Z but only weakly predictive of the outcome, has two effects: it removes X’s own confounding bias and amplifies any remaining bias due the omitted confounder [17, 19, 21]. The bias-amplifying effect may actually dominate the bias-reducing effect such that conditioning on a confounder X may increase instead of reduce OVB in the treatment effect. In order to fully characterize the mechanics of OVB, we discuss the more general case where X and U (a) are (un)correlated, (b) induce biases in different directions, and (c) where X is unreliably measured. We first discuss the case of uncorrelated confounders and then the case where X and U are correlated.

The left graph in Figure 2 shows the DGM with two uncorrelated confounders, an observed confounder X and an unobserved confounder U. The corresponding linear SCM is given by

(2) X = ε X , U = ε U , Z = α X X + α U U + ε Z , Y = τ Z + β X X + β U U + ε Y ,

### Figure 2:

Causal graphs with two uncorrelated confounders X and U, with X reliably measured in the left graph, and X measured with error in the right graph.

with the same constraints as before such that the parameters represent standardized coefficients. For this linear SCM, the initial OVB due to omitted confounders X and U is O V B ( τ ˆ | { } ) = α X β X + α U β U , which represents the biases induced by the two open backdoor paths ZXY and ZUY. It is important to note that the two bias terms add up if both terms are either positive or negative, but partially or fully offset each other if one term is positive and the other negative.

### Reliably measured confounder X

Adjusting for a reliably measured confounder X results in a biased regression estimator with

(3) O V B ( τ ˆ | X ) = α U β U × 1 1 α X 2 .

A comparison of this bias formula (Proof 3 in Appendix C) with the initial OVB indicates that conditioning on X has two effects: First, a bias-reducing effect because X blocks the backdoor path ZXY and thus eliminates its own confounding bias ( α X β X ). Second, a bias-increasing effect because the bias due to the unblocked backdoor path ZUY ( α U β U ) is amplified by the factor 1 / ( 1 α X 2 ) .

If the bias-increasing effect dominates the bias-reducing effect then conditioning on X leads to an increase in the absolute OVB, that is, the OVB after conditioning on the confounder X is greater than without conditioning on X: α U β U 1 α X 2 > α X β X + α U β U . The discussion of the conditions under which the absolute OVB actually increases requires a distinction between the case where X and U induce bias in the same direction (no offsetting biases) and where they induce bias in different directions such that their respective confounding biases partially or fully offset each other.

Biases in the Same Direction. If both confounders induce bias in the same direction, s g n ( α X β X ) = s g n ( α U β U ) , then conditioning on X results in an increasing OVB only if the bias-amplifying effect dominates the bias-reducing effect, which is the case if [4]

(4) α U β U α X β X > 1 α X 2 α X 2 . 4

Conditioning on X very likely increases the absolute OVB in two situations. First, if the bias induced by U ( α U β U ) is much larger than the bias induced by X ( α X β X ), implying that the bias ratio on the left-hand side in (4) is large. And second, if X strongly determines Z ( α X close to 1) such that the right-hand side in (4) is close to zero. Thus, adjusting for a confounder with α X close to 1 and β X close to zero (i. e., a near-IV) very likely increases the absolute bias.

In the upper left plot of Figure 3 the two dark grey areas show combinations of α X values and bias ratios α U β U α X β X for which the absolute OVB increases. The two light grey areas indicate areas of decreasing OVB. The line separating the dark and light grey areas represents the 100% bias contour line where conditioning on X neither reduces nor increases OVB (i. e., 100% of the initial OVB is left). The darker shade of the two dark grey areas indicates the region where conditioning on X leads to a bias that is at least twice as large as the initial bias. Thus, the contour line that separates the two dark grey areas represents the 200% bias contour line. Similarly, the very light grey area indicates that less than 50% of the initial bias is remaining. The contour line separating the two light grey areas represents the 50% bias contour line. For example, conditioning on a confounder with α X = . 1 results in an increasing OVB only if the bias ratio α U β U α X β X is greater than 1 . 1 2 . 1 2 = 101 , that is, if the bias induced by the unobserved confounder U is at least 101 times greater than the bias induced by X. However, if X is strongly related to treatment, α X = . 9 , conditioning on X results in an increasing OVB if the bias induced by U is at least one fourth ( 1 . 9 2 . 9 2 = . 23 ) of X’s bias. In this case, bias amplification dominates bias reduction: though conditioning on X removes its own bias α X β X which amounts to 81% (= 1/(1 + .23)) of the total confounding bias, [5] the amplification of the remaining 19% (= .23/(1 + .23)) due to omitting U ( α U β U ) is strong enough to offset the bias-reducing effect because the bias amplification factor is 1/(1 − .92) = 5.26.

### Figure 3:

Increasing and decreasing OVB due to conditioning on an uncorrelated confounder X. The two dark grey areas indicate an increasing OVB, with 100%-200% (lighter shade) and 200% or more (darker shade) remaining bias. The two light grey areas indicate a decreasing OVB, with 50%-100% (darker shade) and 50% or less (lighter shade) remaining bias.

Offsetting Biases. For sgn ( α X β X ) sgn ( α U β U ) , the confounding biases induced by X and U partially or even completely offset each other such that α X β X + α U β U < max ( α X β X , α U β U ) . If U induces less bias than X, α U β U α X β X , adjusting for the observed confounder X increases rather than reduces OVB only if

(5) α U β U α X β X 1 α X 2 2 α X 2 ( P r o o f 4 i n A p p e n d i x C ) .

But if U induces more bias than X, α U β U > α X β X , then conditioning on X always increases OVB because the remaining bias due to the unblocked backdoor path ZUY is necessarily greater than the initial bias: α U β U > α X β X + α U β U .

The upper right plot in Figure 3 shows areas of increasing and decreasing absolute OVB when biases (partially) offset each other. For α X 0 , OVB increases as long as the bias induced by U is at least half of X’s bias: lim α X 0 1 α X 2 2 α X 2 = 1 2 . For α X = . 5 , OVB increases if the bias ratio exceeds 1 . 5 2 2 . 5 2 = . 43 . If α X is close to 1, say .95, then OVB increases as long as the bias induced by U is at least about one tenth of X’s bias ( 1 . 95 2 2 . 95 2 = . 09 ).

To summarize, for offsetting biases, the absolute OVB increases in two situations: First, if the confounding biases induced by X and U nearly offset each other ( α X β X α U β U ). In fact, independent of the value of α X , OVB always increases if the bias induced by the unobserved confounder U is at least half of X’s bias ( α U β U > α X β X / 2 ). And second, if X strongly determines Z such that α X is close to 1, then the absolute OVB increases even when α X β X >> α U β U . The increase in the absolute OVB is mostly a result of the cancellation of the bias-offsetting effect, but the amplification of the remaining bias adds to the increase. Also note that the sign of the initial and adjusted OVB may differ. For instance, the initial OVB might be positive, but adjusting for X might turn the positive OVB into a negative OVB.

### Unreliably measured confounder X

The OVB formula in (3) only holds for a reliably measured uncorrelated confounder X. The right graph in Figure 2 shows the case with a fallibly measured X. The node of X now turns into a vacant node (open circle) indicating that X is not directly observed. Instead, we only have an unreliable measure X* which is given by X = X + e , where e is an independent error with mean zero and variance σ e 2 . [6] Since Var(X) = 1, the reliability of X* is given by γ = 1 / ( 1 + σ e 2 ) . Measurement error in X* has no influence on the initial OVB, O V B ( τ ˆ | { } ) = α X β X + α U β U , but affects OVB after adjusting for the fallible X* (Proof 3 in Appendix C):

(6) O V B ( τ ˆ | X ) = { α U β U + α X β X ( 1 γ ) } × 1 1 α X 2 γ .

In comparison to the OVB for a reliably measured confounder X in (3), measurement error has two effects: First, the bias left due to (partially) unblocked backdoor paths now consists of two components, α U β U and α X β X ( 1 γ ) . Besides the open backdoor path ZUY (due to omitting U), adjusting for X* no longer fully blocks the backdoor path ZXY such that ( 1 γ ) % of X’s bias is left. That is, X* removes the bias induced by X only to the degree of its reliability ( γ ). The less reliable the measurement, the more of X’s bias will remain. Second, measurement error attenuates the bias amplification factor since 1 / ( 1 α X 2 γ ) is always less than 1 / ( 1 α X 2 ) because 0 γ 1 . A completely unreliable measure X* with γ 0 neither removes nor amplifies any bias such that the initial OVB remains: lim γ 0 O V B ( τ ˆ | X ) = α X β X + α U β U (also see [25]). On the other extreme, with a perfectly reliable measure X ( γ = 1 ), the OVB formula in (6) reduces to the OVB formula in (3).

Biases in the Same Direction. The second and third row of plots in Figure 3 show the areas of increasing OVB (the two dark grey areas) and decreasing OVB (the two light grey areas) for an unreliably measured confounder X ( γ = . 75 in the second row and γ = . 5 in the third row). In the left columns of plots for sgn ( α X β X ) = sgn ( α U β U ) , the 100% bias contour lines are the same as for the reliably measured confounder (upper left plot), but the 200% and 50% bias contour lines change. Unreliability in X does not change the 100 % contour line because measurement error always results in an attenuation of OVB toward the initial OVB [26] (see Proof 5 in Appendix C which also contains a more detailed discussion). Since the 100% contour line represents situations where conditioning on X does not alter the initial OVB (i. e., bias reduction is exactly offset by bias amplification), measurement error has no effect. But if adjusting for the reliable X increases OVB then measurement error attenuates the increase as shown by the retreating 200% contour line (as one moves from the plot in the first row to the plots in the second and third row). If conditioning on the reliable X reduces OVB then measurement error attenuates bias reduction as indicated by the retreating 50% contour line.

Offsetting Biases. For offsetting biases ( sgn ( α X β X ) sgn ( α U β U ) , shown in the right column of Figure 3), all bias contour lines depend on the extent of measurement error. In comparison to the reliably measured confounder (upper right plot), more measurement error in X* results in an expansion of the light grey areas of diminishing OVB, that is, measurement error makes an increasing OVB less likely because the cancellation of the offsetting biases is attenuated. Though unreliability decreases the chances of an increasing OVB, it does not imply that the fallible X* necessarily removes more bias than the corresponding reliable measure. A comparison of the 50% bias contour lines (or the very light grey area) across the three plots reveals that the fallibly X* can remove less OVB than the reliably X.

### Imbalance in confounders U and X

For both reliably and unreliably measured confounders X, bias amplification operates via increasing the imbalance in U and X. For an unreliably measured confounder X*, the initial imbalance in U ( α U ) and remaining imbalance in X ( α X ( 1 γ ) ) are inflated by the factor 1 / ( 1 α X 2 γ ) : I m b a l a n c e ( U | X ) = α U 1 α X 2 γ and I m b a l a n c e ( X | X ) = α X ( 1 γ ) 1 α X 2 γ (Proof 1 in Appendix C). The imbalance formula for U indicates that adjusting for X* always increases the absolute imbalance in U because the amplification factor 1 / ( 1 α X 2 γ ) is less than one (but note that measurement error attenuates bias amplification and thus the decrease in U’s absolute imbalance). Regarding the imbalance in X, conditioning on X* cannot fully balance X because the unreliable X* fails to completely remove the association between Z and X. However, the unreliable measure X* will be balanced, I m b a l a n c e ( X | X ) = 0 . Thus, balance in a fallible covariate X* does not imply that the underlying data-generating confounder X will be balanced. Particularly if α X >> 0 or γ < . 75 , then the absolute imbalance in X after adjusting for X* may still be large but it will never exceed the absolute initial imbalance, I m b a l a n c e ( X | { } ) = α X (Proof 2 in Appendix C). This result does not generalize to the more general case with multiple observed confounders. If one conditions not only on a single unreliable confounder but on multiple, possibly uncorrelated confounders simultaneously, the resulting imbalance in the latent X might exceed the initial imbalance. This is so because the remaining imbalance in X after conditioning on X*, I m b a l a n c e ( X | X ) , is further amplified by any other confounder we condition on (just like the imbalance in U).

## OVB and imbalance due to conditioning on a correlated confounder

The mechanics of OVB become slightly more complex when confounders are correlated. Intuitively, one might think that the correlation between an observed (X) and unobserved confounder (U) always helps in reducing OVB when conditioning on X. But this is not necessarily true because the correlation also triggers the bias-amplifying potential of the hidden confounder or might result in a cancellation of offsetting biases (e. g., if both X and U induce positive bias on their own, a negative correlation would partially offset their biases). These bias-increasing effects can actually dominate the bias-reducing effects. Since bias amplification, cancellation of offsetting biases, and measurement error operate as before, we only highlight the changes due to the correlation of confounders.

The left graph in Figure 4 shows the DGM with correlated confounders X and U. The linear SCM is the same as for the uncorrelated case in Eq. (1), except that X and U are correlated with C o r ( X , U ) = ρ . The correlation between X and U might be due to a common cause C (XCU), a causal effect of X on U (XU), or a causal effect of U on X (UX). The initial OVB is then given by O V B ( τ ˆ | { } ) = α X β X + α U β U + α X ρ β U + α U ρ β X , which reflects the biases due to all four backdoor paths between Z and Y in Figure 4: ZXY, ZUY, ZX UY, and ZU XY.

### Figure 4:

Causal graphs with two correlated confounders X and U, with X reliably measured in the left graph and X measured with error in the right graph.

### Reliably measured confounder X

Adjusting for the reliably measured confounder X but omitting U results in [7]

(7) O V B ( τ ˆ | X ) = α U β U ( 1 ρ 2 ) × 1 1 ( α X + α U ρ ) 2 . 7

The OVB formula indicates that conditioning on a correlated confounder X has three effects. First, it eliminates its own confounding bias ( α X β X ) but also the entire confounding bias induced by X’s correlation with U ( α X ρ β U + α U ρ β X ). That is, conditioning on X blocks all backdoor paths going through X (i. e., ZXY, ZX UY, and ZU XY). Second, because of X and U’s correlation, X partially blocks the backdoor path ZUY to the extent of the squared correlation ρ 2 , thus the bias due to the unobserved U reduces to α U β U ( 1 ρ 2 ) . And third, the correlation also affects the bias amplification factor 1 / ( 1 ( α X + α U ρ ) 2 ) because conditioning on X triggers U’s bias-amplifying potential to the extent of their correlation as reflected by the additional term α U ρ in the denominator.

Depending on the sign of α U ρ , the correlation can strengthen, weaken, or even neutralize the bias amplification factor. If sgn ( α U ρ ) = sgn ( α X ) then the correlation boosts bias amplification in comparison to the uncorrelated case because α X + α U ρ > α X . The stronger the correlation and the larger α U , the stronger the bias-amplifying effect. If sgn ( α U ρ ) sgn ( α X ) , the correlation can strengthen (if α X + α U ρ > α X ), weaken (if α X + α U ρ < α X ) or completely cancel bias amplification (if α X = α U ρ ). Thus, even with highly correlated confounders X and U, there is no guarantee that conditioning on a correlated X reduces OVB (examples are briefly discussed at the end of the following subsection).

### Unreliably measured confounder X

The right graph in Figure 4 shows the same causal diagram as before but with the fallible covariate X*. In this case, one can show (Proof 3 in Appendix C) that conditioning on X* results in an OVB of

(8) O V B ( τ ˆ | X ) = { α U β U ( 1 ρ ˜ 2 ) + ( α X β X + α X ρ β U + α U ρ β X ) ( 1 γ ) } × 1 1 ( α X + α U ρ ) 2 γ .

All four terms of the initial bias appear in the OVB formula, but the biases induced by the four backdoor paths are not fully effective. First, the correlation of the unreliable X* with the unobserved confounder U, C o r ( X , U ) = ρ ˜ = ρ γ , reduces the bias induced by U to the extent of the squared correlation ρ ˜ 2 , leaving a bias of α U β U ( 1 ρ ˜ 2 ) . Second, the unreliable X* blocks the three backdoor paths via X only to the extent of its reliability ( γ ) and thus leaves a bias of ( α X β X + α X ρ β U + α U ρ β X ) ( 1 γ ) . Finally, the remaining bias due to the four partially unblocked backdoor paths is amplified but the bias amplification factor is attenuated by the reliability γ .

Due to the increased complexity of the OVB formulas, an easily interpretable inequality as for the uncorrelated confounder case is not derivable. Thus, we illustrate the effect of correlated confounders with two examples. The first row of plots in Figure 5 shows for two different parameter settings the areas of increasing (dark grey) and decreasing OVB (light grey) as a function of the correlation ρ (abscissa) and the unobserved confounder’s coefficient α U (ordinate). For both plots we set β X = β U = . 1 , but α X = . 3 in the left plot and α X = . 9 in the right plot (making X a near-IV in the latter case). In each plot, quadrant I (with ρ 0 and α U 0 ) represents the situation where all biases induced by X and U go into the same direction because all five data-generating parameters are positive. Quadrants II, III, and IV show the results for partial or completely offsetting biases (because the signs of the parameters differ).

### Figure 5:

Increasing and decreasing OVB due to conditioning on a correlated confounder X. The two dark grey areas indicate an increasing OVB, with 100%-200% (lighter shade) and 200% or more (darker shade) remaining bias. The two light grey areas indicate a decreasing OVB, whith 50%-100% (darker shade) and 50% or less (lighter shade) remaining bias. The white areas indicate parameter combinations that are impossible for standardized path coefficients.

Consider quadrant I of the top right plot in Figure 3, where the confounder X strongly affects Z ( α X = . 9 ): OVB can exceed the initial bias even if one conditions on a confounder X that is almost perfectly correlated with U. In general, it is hard to derive a generalizable pattern from the two example plots. Without knowing the sign and magnitude of the five parameters it is impossible to predict whether conditioning on a correlated X or X* reduces or increases OVB even if X is highly correlated with U. The second and third row in Figure 3 shows the effect of measurement error, which is the same as for the uncorrelated case (i. e., attenuation to the initial bias; Proof 5 in Appendix C).

### Imbalance in confounders U and X

As for the case of uncorrelated confounders, the bias-amplifying effect of conditioning on a reliably or unreliably measured confounder X can be explained by the amplified imbalance in U and X. The absolute initial imbalance in U, I m b a l a n c e ( U | { } ) = α U + α X ρ , might increase or decrease once one conditions on X*, even when U is correlated with X*. Adjusting for the correlated X* changes the initial imbalance in U to α U ( 1 ρ ˜ 2 ) + α X ρ ( 1 γ ) , which then is amplified by the factor 1 / ( 1 ( α X + α U ρ ) 2 γ ) such that we obtain I m b a l a n c e ( U | X ) = α U ( 1 ρ ˜ 2 ) + α X ρ ( 1 γ ) 1 ( α X + α U ρ ) 2 γ . Compared to the absolute value of the initial imbalance (before adjusting for X*), the absolute imbalance in U after adjusting for X* might be smaller or larger (Proof 2 in Appendix C). Despite the correlation, conditioning on X* can increase the imbalance in U because the term α U ρ may strengthen the bias amplification factor.

Correspondingly, conditioning on X* first reduces the absolute initial imbalance in X from I m b a l a n c e ( X | { } ) = α X + α U ρ to ( α X + α U ρ ) ( 1 γ ) , which again is amplified such that I m b a l a n c e ( X | X ) = ( α X + α U ρ ) ( 1 γ ) 1 ( α X + α U ρ ) 2 γ . Multiplying U’s imbalance by β U and X’s imbalance by β X , and then adding the two terms results in the OVB formula (8). As for the uncorrelated confounder case, the absolute imbalance in X after adjusting for X* will always be smaller than before the adjustment, I m b a l a n c e ( X | X ) I m b a l a n c e ( X | { } ) (Proof 2 in Appendix C). Again, this only holds for the case with a single observed confounder X. Conditioning on multiple confounders, including X*, can actually increase the imbalance in X (but as for the imbalance in U, whether the imbalance in X decreases or increases depends on the correlation among the observed confounders).

With a perfectly reliably measured X ( γ = 1 ), X will be fully balanced but U remains imbalanced with I m b a l a n c e ( U | X ) = α U ( 1 ρ 2 ) 1 ( α X + α U ρ ) 2 . Note that neither the imbalance in U nor in X (given it is unreliably measured) can be tested empirically since both are unobserved.

## Discussion

The investigation of the OVB mechanics revealed that conditioning on a confounder provokes two opposing effects, a bias-removing effect and a bias-increasing effect. If the bias-increasing effect dominates the bias-removing effect, then OVB increases. The increase in OVB can be caused by the amplification of any bias left due to unblocked backdoor paths, the cancellation of offsetting biases, or by both together. The overall extent of bias amplification is driven by two factors: (i) the bias left due to unblocked backdoor paths and (ii) the size of the multiplicative bias amplification factor. Both factors depend on the strength of the correlation between the observed and unobserved confounder and the degree of measurement error in the observed confounder. Though the correlation helps in partially removing the bias induced by the unobserved confounder, it also picks up the bias-amplifying potential of the unobserved confounder, and thus can further boost bias amplification. Therefore, even a high correlation between the observed and unobserved confounder does not guarantee that OVB will decrease. Though measurement error attenuates the bias amplification factor it also attenuates the confounder’s potential to remove bias such that measurement error may have a positive or negative effect on OVB. Bias amplification is not an issue if conditioning on a set of confounders removes all the bias (i. e., no bias is left to be amplified) or if the amplification factor is one (i. e., α X = α U ρ ). Table 1 and Table 2 summarize the formulas and results for uncorrelated and correlated confounders, respectively. Appendix B shows that the very same OVB mechanics operate with dichotomous instead of continous treatment variables (though the formulas a slightly different).

Table 1:

Uncorrelated confounders X and U: Omitted variable bias (OVB) and imbalance before and after adjusting for X*.

Initial OVB and Imbalance OVB and Imbalance after adjusting for X*
Omitted variable bias O V B ( τ ˆ | { } ) = α X β X + α U β U O V B ( τ ˆ | X ) = α U β U + α X β X ( 1 γ ) 1 α X 2 γ
Imbalance in U I m b a l a n c e ( U | { } ) = α U I m b a l a n c e ( U | X ) = α U 1 α X 2 γ
Imbalance in X I m b a l a n c e ( X | { } ) = α X I m b a l a n c e ( X | X ) = α X ( 1 γ ) 1 α X 2 γ
Effect of conditioning on X* when …
biases are in the same direction biases offset each other
Absolute omitted variable bias Increase in OVB is most likely if (a) the bias induced by the unobserved confounder U is much larger than the bias induced by confounder X or (b) confounder X strongly affects Z. If the bias induced by the unobserved confounder U exceeds half of the bias induced by X, OVB always increases (this case also includes almost perfectly offsetting biases). If the bias induced by the unobserved confounder U is less than half of the bias induced by X, OVB most likely increases if X strongly affects Z (provided X is reliably measured).
Absolute imbalance Imbalance in U always increases. Imbalance in X always decreases. Imbalance in U always increases. Imbalance in X always decreases.
Effect of measurement error Attenuates any increase in OVB and attenuates any decrease in OVB. If the bias induced by the unobserved confounder U exceeds half of the bias induced by X, measurement error attenuates any increase in OVB. If the bias induced by the unobserved confounder U is less than half of the bias induced by X, measurement error attenuates any increase in OVB (and might even turn an increase into a decrease) but may attenuate or strengthen any decrease in OVB.
Table 2:

Correlated confounders X and U: Omitted variable bias (OVB) and imbalance before and after adjusting for X*.

Initial OVB and Imbalance OVB and Imbalance after adjusting for X*
Omitted variable bias O V B ( τ ˆ | { } ) = α X β X + α U β U + α X ρ β U + α U ρ β X O V B ( τ ˆ | X ) = α U β U ( 1 ρ ˜ 2 ) + ( α X β X + α X ρ β U + α U ρ β X ) ( 1 γ ) 1 ( α X + α U ρ ) 2 γ
Imbalance in U I m b a l a n c e ( U | { } ) = α U + α X ρ I m b a l a n c e ( U | X ) = α U ( 1 ρ ˜ 2 ) + α X ρ ( 1 γ ) 1 ( α X + α U ρ ) 2 γ
Imbalance in X I m b a l a n c e ( X | { } ) = α X + α U ρ I m b a l a n c e ( X | X ) = ( α X + α U ρ ) ( 1 γ ) 1 ( α X + α U ρ ) 2 γ
Effect of conditioning on X* when …
biases are in same the direction biases offset each other
Absolute omitted variable bias Increase in OVB is most likely if(a) the bias induced by the unobserved confounder U is much larger than the bias induced by confounder X and the correlation between X and U is low, or (b) confounder X strongly affects Z –a high correlation between X and U strongly boosts bias amplification. Whether OVB increases strongly depends. on the signs and magnitudes of all five parameters. If the biases induced by X and U strongly offset each other, an increase in OVB almost surely results – unless the correlation between X and U is close to 1.
Absolute imbalance Imbalance in U may increase or decrease. Imbalance in X always decreases. Imbalance in U may increase or decrease. Imbalance in X always decreases.
Effect of measurement error Attenuates any increase in OVB and attenuates any decrease in OVB. Attenuates any increase in OVB (and might even turn an increase into a decrease) but may attenuate or strengthen any decrease in OVB.

Though we restricted our discussion of OVB to the case with a single observed and unobserved confounder, the principles of the OVB mechanics also apply to the multiple confounder case where X and U represent sets of observed and unobserved confounders. However, the OVB formulas would be by far more complex because the correlation structure within and between the two sets of confounders also needs to be considered (for an OVB formula in matrix notation, see [27]). Moreover, cancellation of offsetting biases and bias amplification are not restricted to the linear case, they also occur in nonlinear settings [17]; but it is much harder to derive closed OVB formulas that are informative about the OVB mechanics.

We also showed that bias amplification operates via increasing the imbalance in unobserved confounders. That is, conditioning on an observed confounder can significantly increase the unobserved confounders’ imbalance and, thus, turn them into even stronger confounders. If the observed and unobserved confounders are uncorrelated, the imbalance in the unobserved confounders always increases. Thus, balancing a large set of observed covariates via matching or regression adjustment does not imply that the imbalance in unobserved confounders decreases.

In the presence of omitted or unobserved variables, is it possible to select a subset of observed covariates that minimizes OVB? Or, is it at least possible to make sure that the selected covariates do not increase OVB? With almost perfect knowledge about the data-generating selection and outcome models one could actually select the set of covariates that minimizes OVB. But such knowledge is rarely available. Without reliable knowledge about the true DGM it seems impossible to know whether conditioning on a set of covariates minimizes or even reduces the confounding bias. While empirical covariate selection strategies, that rely on observed relations between the covariates and the treatment or outcome, can be very successful when all confounding covariates are reliably measured, it is not clear how good or bad these strategies perform in the presence of unobserved or unreliably measured confounders. However, partial knowledge might occasionally allow an informed assessment of whether adjusting for a set of covariates brings us at least closer to a causal effect estimate (for instance, we might know that only positive selection took place and that the observed covariates cover the most important confounders but no near-IVs).

The OVB mechanics discussed in this article have far-reaching implications for practice. Given unobserved confounders, neither conditioning on all or a large set of observed pre-treatment covariates (as publicized in [28], or [9]), nor conditioning on a small set of covariates that has been selected on subject-matter or empirical grounds [21] can guarantee that OVB will decrease. For matching designs like propensity score matching this means that achieving balance on all observed pre-treatment covariates neither implies that the confounding bias has been minimized or even reduced nor that the imbalance in unobserved covariates, including the latent constructs of fallible measures, diminished. The same holds for all methods dealing with bias due to nonresponse or attrition – conditioning on a large set of covariates does not imply that nonresponse or attrition bias in the statistic of interest is successfully addressed [22]; Or for two-stage least-squares analyses (2SLS) of conditional IV designs, conditioning on a set of observed covariates does not guarantee that the bias due to a potential violation of the exclusion restriction is minimized. Whenever covariate adjustments are made in the hope to reduce different types of confounding bias, a thoughtless or automated selection of covariates may increase instead of reduce the bias.

Since we used a very simple data-generating model to explain the mechanics of OVB, one needs to be careful in deriving practical guidelines about when to condition on an observed covariate and when not. The decision about adjusting for a given covariate strongly depends on the presumed real-world data-generating model. For instance, if there would be only a single confounder X but which has been unreliably measured, then conditioning on X* would always reduce selection bias. But when there is one or multiple unobserved confounders, then it is already less clear whether conditioning on X* actually reduces OVB. In practice, the situation is usually even more complex because a confounding path might be blocked in more than one way. For instance, if we observed an intermediate covariate W on U’s confounding path, ZWUY, then conditioning on W would not result in any OVB despite the omission of confounder U (provided there are no other unobserved confounders). But if one conditions neither on U nor on W the OVB mechanics are in place again.

Sometimes it is also possible to circumvent unobserved confounding by using designs that exploit other observed covariates. For instance, if the observed set of covariates contains an instrumental variable then we could use an instrumental variable design to identify the complier average treatment effect. Or, if data contain a pretest measure of the outcome then a gain score or difference-in-differences design can deal with unobserved time-invariant confounding [29]. However, the assumptions underlying these designs might be less credible than the conditional independence assumption such that covariate adjustments via regression or matching methods might be preferable. But given the uncertainty about the magnitude of OVB left after adjusting for a set of covariates, it is important to conduct sensitivity analyses that assess the estimated treatment effect’s sensitivity to unobserved confounders [30, 31, 32]. Or, with partial knowledge about the data-generating process, one can pursue a partial identification strategy and compute bounds on the treatment effect [33]. In any case, lacking strong subject-matter theory, researchers should abstain from making strong causal claims from a single observational study. Causal claims are much more credible when built on multiple independent replications with different study designs.

### References

1. Pearl J. Causality: models, reasoning, and inference, 2nd ed. New York, NY: Cambridge University Press, 2009. Search in Google Scholar

2. Angrist JD, Pischke JS. Mostly harmless econometrics: an empiricist’s companion. Princeton, NJ: Princeton University Press, 2009. Search in Google Scholar

3. Rosenbaum PR, Rubin DB. The central role of the propensity score in observational studies for causal effects. Biometrika 1983;70:41–55. Search in Google Scholar

4. Shpitser I, Vander Weele TJ, Robins JM. On the validity of covariate adjustment for estimating causal effects. In Proceedings of the Twenty-Sixth Conference on Uncertainty in Artificial Intelligence. Corvallis: AUAI Press, 2010:527–536. Search in Google Scholar

5. Seber GA, Lee AJ. Linear regression analysis, 2nd ed. Hoboken, NJ: Wiley, 2003. Search in Google Scholar

6. Box GE. Use and abuse of regression. Technometrics 1966;8(4):625–629. Search in Google Scholar

7. Clarke KA. The phantom menace: omitted variable bias in econometric research. Conflict Manage Pease Sci 2005;22:341–352. Search in Google Scholar

8. Gelman A, Hill J. Data analysis using regression and multilevel/hierarchical models. Cambridge: Cambridge University Press, 2007. Search in Google Scholar

9. Steiner PM, Cook TD, Li W, Clark MH. Bias reduction in quasi-experiments with little selection theory but many covariates. J Res Educ Eff 2015;8(4):552–576. Search in Google Scholar

10. Wakefield J. Bayesian and frequentist regression methods. New York: Springer, 2013. Search in Google Scholar

11. Imai K, King G, Stuart EA. Misunderstandings between experimentalists and observationalists about causal inference. J R Stat Soc Ser A 2008;171:481-502. Search in Google Scholar

12. Rosenbaum PR, Rubin DB. Reducing bias in observational studies using subclassification on the propensity score. J Am Stat Assoc 1984;79(387):516–524. Search in Google Scholar

13. Stuart EA, Rubin DB. Best practices in quasi-experimental designs: matching methods for causal inference. Osborne JW, editors. Best practices in quantitative methods. Thousand Oaks, CA: Sage, 2008:155–176. Search in Google Scholar

14. Ding P, Miratrix LW. To adjust or not to adjust? Sensitivity analysis of M-bias and butterfly-bias. J Causal Inference 2015;3(1):41–57. Search in Google Scholar

15. Elwert F, Winship C. Endogenous selection bias. Ann Rev Soc 2014;40:31–53. Search in Google Scholar

16. Greenland S. Quantifying biases in causal models: classical confounding vs collider-stratification bias. Epidemiology 2003;14:300–306. Search in Google Scholar

17. Pearl J. On a class of bias-amplifying variables that endanger effect estimates. 2010:425–432. Available at:. Proceedings of the Twenty-Sixth Conference on Uncertainty in Artificial Intelligence http://event.cwi.nl/uai2010/papers/UAI20100120.pdf. Search in Google Scholar

18. Wooldridge JM. Should instrumental variables be used as matching variables?. East Lansing, MI: Michigan State University, 2009. Search in Google Scholar

19. Pearl J. Understanding bias amplification [Invited commentary]. Am J Epidemiol 2011;174:1223–1227. Search in Google Scholar

20. Bhattacharya J, Vogt W. Do instrumental variables belong in propensity scores? National Bureau of Economic Research, 2007 (NBER Technical Working Paper No. 343). Cambridge, MA. Search in Google Scholar

21. Myers JA, Rassen JA, Gagne JJ, Huybrechts KF, Schneeweiss S, Rothman KJ, et al Effects of adjusting for instrumental variables on bias and precision of effect estimates. Am J Epidemiol 2011;174:1213–1222. Search in Google Scholar

22. Kreuter F, Olson K. Multiple auxiliary variables in nonresponse adjustment. Soc Methods Res 2011;40(2):311–332. Search in Google Scholar

23. Brookhart MA, Schneeweiss S, Rothman KJ, Glynn RJ, Avorn J, Stürmer T. Variable selection for propensity score models. Am J Epidemiol 2006;163(12):1149–1156. Search in Google Scholar

24. Brooks JM, Ohsfeldt RL. Squeezing the balloon: propensity scores and unmeasured covariate balance. Health Serv Res 2013;48(4):1487–1507. Search in Google Scholar

25. Cook TD, Steiner PM, Pohl S. How bias reduction is affected by covariate choice, unreliability, and mode of data analysis: results from two types of within-study comparison. Multivariate Behav Res 2009;44:828–847. Search in Google Scholar

26. Steiner PM, Cook TD, Shadish WR. On the importance of reliable covariate measurement in selection bias adjustments using propensity scores. J Educ Behav Stat 2011;36(2):213–236. Search in Google Scholar

27. Middleton JA, Scott MA, Diakow R, Hill JL. Bias amplification and bias unmasking. Unpublished manuscript 2016. Search in Google Scholar

28. Imbens G, Rubin D. Causal inference for statistics, social, and biomedical sciences: an introduction. New York, NY: Cambridge University Press, 2015. Search in Google Scholar

29. Kim Y, Steiner PM. Gain scores revisited: a graphical models approach. Unpublished manuscript 2016. Search in Google Scholar

30. Ding P, Vander Weele TJ. Sensitivity analysis without assumptions.Epidemiology2016;27(3):368–377. Search in Google Scholar

31. Rosenbaum PR. Observational Studies, 2nd ed. New York, NY: Springer, 2002. Search in Google Scholar

32. Vander Weele TJ, Arah OA. Unmeasured confounding for general outcomes, treatments, and confounders: bias formulas for sensitivity analysis. Epidemiology 2011;22(1):42–52. Search in Google Scholar

33. Manski CF. Identification for prediction and decision Harvard University Press, 2008. Cambridge. Search in Google Scholar

## Appendix A: Bias amplification when matching or stratifying on an IV

Bias amplification can also be intuitively explained within the context of matching or stratifying treatment and control cases on the IV (i. e., with a dichotomous treatment Z). Consider the case of exact full matching on IV, that is, all treatment and control cases with IV=v are matched together (this is equivalent to exact stratification because the set of matched cases forms a unique stratum with IV=v). For simplicity, we first assume that the dichotomous treatment Z is a deterministic function of IV and U: Z = f ( I V , U ) = 1 I V + U > c , where Z = 1 if the sum IV + U exceeds a threshold c, otherwise Z = 0 (indicating the control condition). Now assume that we match on the observed IV in the hope to remove potential confounding bias. Then, for a given stratum with IV = v, the treatment status Z = f ( U | I V = v ) = 1 U > c v is now exclusively determined by U: Cases with U > c v received the treatment and cases with U c v received the control condition. Thus, all treatment cases with IV = v must have strictly larger values in U than the control cases, that is, the treatment and control cases  distribution of U no longer overlap. But without matching on IV, the distributions of U would have overlapped, enabling exact matches on U. Thus, matching on the IV increases the treatment and control group’s heterogeneity in U which is reflected by the increased imbalance.

The same argument holds for a treatment function with an independent error term (i. e., unobserved factors determining Z): Z = f ( I V , U , ε ) = 1 I V + U + ε > c . Matching on IV then restricts the pool of potential matches with regard to U – if one were to match on the unobserved U. Due to the error term, we still could find exact matches on U but, nonetheless, the difference in the treatment and control cases  distribution of U is larger than before matching on IV. Note that the imbalance in U does not necessarily have to increase within each stratum, but it will necessarily increase on average across strata.

## Appendix B: Bias amplification and cancellation of offsetting biases for a dichotomous treatment

All the bias formulas we discussed so far referred to regression estimators for a continuous treatment variable. Since treatment variables are frequently dichotomous, we briefly characterize the bias for a dichotomous treatment indicator Z* (this section follows the formalization used by [14]). Figure 6 shows the DGM with two correlated confounders, one measured with error and the other one unobserved. The corresponding SCM we used for the following derivations is given by

X = ε X X = X + e U = ε U Z = α X X + α U U + ε Z Z = 1 i f Z c a n d Z = 0 i f Z < c Y = τ Z + β X X + β U U + ε Y

### Figure 6:

Causal graph for two correlated confounders X and U. The vacant nodes for X, Z and U indicate that they are unobserved. Z* is dichotomous.

In order to derive corresponding OVB formulas, we assume that X and U are distributed according to a bivariate normal distribution with zero expectation, unit variances, and a correlation ρ . Consequently, also Z is normally distributed with zero expectation. We further assume that the treatment effect is zero which considerably simplifies the derivation of the OVB formulas. As before, coefficients of α X , α U , β X , and β U represent standardized coefficients, and the normally distributed error terms ε Z and ε U were chosen such that Var(Z) = 1 and Var(Y) = 1. The dichotomous treatment Z* is obtained from the continuous Z and the cutoff c. The cutoff value c refers to the quantiles of a standard normal distribution ϕ ( c ) because Z N ( 0 , 1 ) . The unreliable measure X* is given by X* = X + e with e N ( 0 , σ e 2 ) .

Under these assumptions the standardized effect of X on Z* is given by α X = α X ϕ ( c ) Φ ( c ) Φ ( c ) and the standardized effect of U on Z* is given by α U = α U ϕ ( c ) Φ ( c ) Φ ( c ) where ϕ ( c ) and Φ ( c ) denote the standard normal probability density and cumulative distribution function, respectively (the Proof is given at the end of the section). Then, the regression estimator’s initial bias before any conditioning (i. e., Y ˆ = γ ˆ + τ ˆ Z Z ) is

(9) O V B ( τ ˆ Z | { } ) = ( α X β X + α U β U + α X ρ β U + α U ρ β X ) × 1 Φ ( c ) Φ ( c ) .

After conditioning on X*, we obtain

(10) O V B ( τ ˆ Z | X ) = { α U β U ( 1 ρ ˜ 2 ) + ( α X β X + α X ρ β U + α U ρ β X ) ( 1 γ ) } × 1 1 ( α X + ρ α U ) 2 γ × 1 Φ ( c ) Φ ( c ) .

Both OVB formulas are identical to the OVB formulas for a continuous treatment variable, except for the constant 1 / Φ ( c ) Φ ( c ) = 1 / V a r ( Z ) which ensures that OVB refers to the change in Z* from 0 to 1 (without the constant the OVB formula would refer to a change in Z* by one standard deviation, just as in the continuous case). Thus, we have the same OVB mechanics and conditions under which conditioning on X* increases OVB as for the continuous treatment case. However, since α X < α X and α U < α U the bias-amplifying effects will always be weaker for a dichotomous treatment than for a corresponding continuous treatment (because the dichotomized version of the continuous treatment will always be less strongly correlated with the continuous confounders). But this does not imply that bias amplification and an increasing OVB is less of an issue with a dichotomous treatment. Just assume that the dichotomous Z* is directly affected by dichotomous confounders X and U (i. e., with respect to Figure 6, X and U are dichotomous and there is no continuous Z on the causal pathway from the dichotomous confounders to Z*; instead X and U directly affect Z*: XZ* and UZ*). In this case, the dichotomous confounders can affect Z* at least as strongly as continuous confounders can affect a continuous Z ( α X and α U are no longer attenuated and the correlation between the confounder and the treatment can theoretically be one as in the continuous treatment and confounder case).

Proof.

OVB with a Dichotomous Treatment

Using the data generating model in Figure 6 with a treatment effect of zero ( τ =0), we derive the OVB formula for the treatment effect from the regression of Y on Z* and X*. We assume that X and U are bivariate normally distributed with zero means, unit variances, and a correlation ρ . This implies that also Z is normally distributed. The unstandardized OLS estimator for the treatment effect can be written in terms of observed correlations as b Z = r Y Z r Y X r Z X 1 r Z X 2 × 1 S D ( Z ) . To obtain the three correlation coefficients, we use the corresponding covariances:

C o v ( Y , Z ) = ϕ ( c ) ( α X β X + α U β U + ρ α X β U + ρ α U β X ) C o v ( Y , X ) = C o v ( X + e , Y ) = C o v ( X , Y ) = C o v ( X , β X X + β U U + e Y ) = β X + ρ β U , C o v ( Z , X ) = C o v ( Z , X + e ) = C o v ( Z , X ) = ϕ ( c ) ( α X + ρ α U ) ,

where ϕ ( x ) denotes the standard normal density function. While C o v ( Y , X ) directly follows from the structural equations, C o v ( Y , Z ) and C o v ( Z , X ) need some further explanations which we exemplify for C o v ( Y , Z ) .

Assuming a constant treatment effect of zero, the treatment effect’s regression estimator from the regression of Y on Z* can be written as the expected difference in the outcome Y for Z*=1 and Z*=0, that is, E ( Y | Z = 1 ) E ( Y | Z = 0 ) . Since the OLS estimator is given by C o v ( Y , Z ) / V a r ( Z ) , we obtain C o v ( Y , Z ) = V a r ( Z ) E ( Y | Z = 1 ) E ( Y | Z = 0 ) . Then, using V a r ( Z ) = Φ ( c ) Φ ( c ) and E ( Y | Z = 1 ) E ( Y | Z = 0 ) = E ( Y | Z c ) E ( Y | Z < c ) = r Z Y ϕ ( c ) Φ ( c ) Φ ( c ) from Lemma 1 and Lemma 2 (see below), and r Z Y = C o v ( α X X + α U U + e Z , β X X + β U U + e Y ) = α X β X + α U β U + ρ α X β U + ρ α U β X , we get C o v ( Y , Z ) = ϕ ( c ) ( α X β X + α U β U + ρ α X β U + ρ α U β X ) .

The covariances and Lemma 1 are then used to obtain expressions for the correlations:

r Y Z = C o v ( Y , Z ) / S D ( Z ) = ϕ ( c ) ( α X β X + α U β U + ρ α X β U + ρ α U β X ) / Φ ( c ) Φ ( c ) r Y X = C o v ( Y , X ) / S D ( X ) = ( β X + ρ β U ) γ , r Z X = C o v ( Z , X ) / S D ( Z ) S D ( X ) = ϕ ( c ) ( α X + ρ α U ) γ Φ ( c ) Φ ( c )

Plugging the correlations into the formula for the treatment effect’s regression estimator results in b Z = O V B ( τ ˆ Z | X ) = ϕ ( c ) α U β U ( 1 ρ ˜ 2 ) + ( α X β X + ρ α X β U + ρ α U β X ) ( 1 γ ) Φ ( c ) Φ ( c ) ϕ ( c ) 2 ( α X + ρ α U ) 2 γ which is equivalent to the OVB since the derivations are based on a treatment effect of zero. The initial bias in the treatment effect of Z* on Y can be obtained by regressing Y onto Z*, that is,

O V B ( τ ˆ Z | { } ) = C o v ( Z , Y ) / V a r ( Z ) = φ ( c ) ( α X β X + α U β U + ρ α X β U + ρ α U β X ) / Φ ( c ) Φ ( c ) .

The two OVBs can be rewritten as

O V B ( τ ˆ Z | { } ) = ( α X β X + α U β U + α X ρ β U + α U ρ β X ) × 1 Φ ( c ) Φ ( c ) a n d O V B ( τ ˆ Z | X ) = { α U β U ( 1 ρ ˜ 2 ) + ( α X β X + α X ρ β U + α U ρ β X ) ( 1 γ ) } × 1 1 ( α X + ρ α U ) 2 γ × 1 Φ ( c ) Φ ( c ) ,

where α X = α X ϕ ( c ) Φ ( c ) Φ ( c ) is the standardized effect of X on Z* and α U = α U ϕ ( c ) Φ ( c ) Φ ( c ) is the standardized effect of U on Z*. α X is the product of the effect of X on Z ( α X ) and the standardized effect of Z on Z* ( ϕ ( c ) Φ ( c ) Φ ( c ) ). The latter is obtained from the regression of Z* on Z together with Lemmas 1 and 2, that is,

C o v ( Z , Z ) V a r ( Z ) × S D ( Z ) S D ( Z ) = C o v ( Z , Z ) V a r ( Z ) × S D ( Z ) S D ( Z ) = { E ( Z | Z = 1 ) E ( Z | Z = 0 ) } × S D ( Z ) = ϕ ( c ) Φ ( c ) ϕ ( c ) Φ ( c ) × Φ ( c ) Φ ( c ) = ϕ ( c ) Φ ( c ) Φ ( c ) .

The first equality follows from inverting the regression, that is, regressing Z on Z* (using the fact the standardized coefficients of the original and inverted regression are equivalent), the second equality rewrites the effect of Z* on Z in terms of conditional expectations and uses SD(Z)=1, and the third equality directly follows from Lemma 1 and 2.

Lemma 1.

Assume Z is distributed according to a standard normal distribution and a binary variable Z* is determined from Z using a cutoff c such that Z = 1 if Z c and Z = 0 otherwise. Then the new random variable Z* follows a Bernoulli distribution with Pr ( Z = 1 ) = p . Since p = Pr ( Z = 1 ) = Pr ( Z c ) = 1 Φ ( c ) = Φ ( c ) , we get V a r ( Z ) = p ( 1 p ) = Φ ( c ) Φ ( c ) .

Lemma 2.

[14]. Assume X and Y follow a bivariate normal distribution with zero means, unit variances, and correlation coefficient ρ . Under these assumptions we have E ( Y | X < c ) = ρ E ( X | X < c ) . Since E ( X | X < c ) = 1 Φ ( c ) c x ϕ ( x ) d x = 1 Φ ( c ) c d ϕ ( x ) = ϕ ( c ) Φ ( c ) , we obtain E ( Y | X < c ) = ρ ϕ ( c ) Φ ( c ) . Similarly, we obtain E ( Y | X c ) = ρ ϕ ( c ) Φ ( c ) = ρ ϕ ( c ) Φ ( c ) since E ( Y | X c ) = E ( Y | X < c ) .

## Appendix C: Proofs

### Proof 1 Imbalance in confounders U and X

For the linear structural model formulated in Eq. (2) and represented by the right causal diagram in Figure 4, we prove for the general case with a correlated and unreliably measured confounder X* the imbalance formula

I m b a l a n c e ( U | X ) = E X E ( U | Z = z + 1 , X ) E ( U | Z = z , X ) = α U ( 1 ρ ˜ 2 ) + α X ρ ( 1 γ ) 1 ( α X + α U ρ ) 2 γ ,

where X, U, Z, and Y are unit-variance variables and X* is a fallible measure of X with reliability γ = 1 / ( 1 + σ e 2 ) (i. e., X = X + e with e N ( 0 , σ e 2 ) ). The correlation between X and U is given by c o r ( X , U ) = ρ < 1 , and the corresponding correlation with X* is c o r ( X , U ) = ρ γ = ρ ˜ . Due to the linearity of the structural model, the difference in expectations of the above imbalance formula is given by the partial regression coefficient for Z of the regression of U on Z and X*: b Z = r U Z r U X r Z X 1 r Z X 2 , where r A B is the correlation coefficient between A and B (note that the difference in expectations represents the change due to a one-unit increase in Z). Then, using correlations

r U Z = C o v ( U , Z ) = C o v ( U , α X X + α U U + ε Z ) = α X ρ + α U r U X = C o v ( U , X ) / S D ( X ) = C o v ( U , X + e ) γ = ρ γ , a n d r Z X = C o v ( Z , X ) / S D ( X ) = C o v ( α X X + α U U + ε Z , X + e ) γ = ( α X + α U ρ ) γ

we obtain

I m b a l a n c e ( U | X ) = r U Z r U X r Z X 1 r Z X 2 = { α U ( 1 ρ ˜ 2 ) + α X ρ ( 1 γ ) } 1 1 ( α X + α U ρ ) 2 γ .

In setting ρ = 0 or γ = 1 all other imbalance formulas presented in this article can be directly derived.

Analogously, the imbalance formula for X is given by the partial regression coefficient for Z from the regression of X on Z and X*. Using

r X Z = C o v ( X , Z ) = C o v ( X , α X X + α U U + ε Z ) = α X + α U ρ a n d r X X = C o v ( X , X ) / S D ( X ) = C o v ( X , X + e ) γ = γ

we get

I m b a l a n c e ( X | X ) = r X Z r X X r Z X 1 r Z X 2 = ( α X + α U ρ ) ( 1 γ ) 1 ( α X + α U ρ ) 2 γ .

### Proof 2 Imbalance inequalities

We prove the following three results: (i) Conditioning on a fallible X* does not fully balance the latent X, and the imbalance can never exceed the initial imbalance (i. e., without conditioning on X or X*): I m b a l a n c e ( X | X ) I m b a l a n c e ( X | { } ) . (ii) If X and U are uncorrelated, conditioning on a fallible X* increases the imbalance in U: I m b a l a n c e ( U | X ) > I m b a l a n c e ( U | { } ) . (iii) For correlated X and U, conditioning on a fallible X* may increase or decrease the imbalance in U.

1. (i)

We show that I m b a l a n c e ( X | X ) I m b a l a n c e ( X | { } ) , that is, ( α X + α U ρ ) ( 1 γ ) 1 ( α X + α U ρ ) 2 γ α X + α U ρ . For ease of notation, we use a = α X + ρ α U such that the inequality simplifies to a ( 1 γ ) 1 a 2 γ a which is identical to writing ( 1 γ ) 1 a 2 γ a a since 0 < γ 1 and a 2 1 (because the path coefficients refer to variables with unit variances). Because of the constraints on γ and a we know that ( 1 γ ) 1 a 2 γ 1 , proving our result. Note that conditioning on X* does not reduce the imbalance in X if a = α X + ρ α U = 0 (another setting would be a = α X + ρ α U = 1 but this is not possible due to the parameter constraints).

2. (ii)

For uncorrelated X and U we show that I m b a l a n c e ( U | X ) > I m b a l a n c e ( U | { } ) , that is, α U 1 α X 2 γ > α U . Using 0 < γ 1 and α X 2 < 1 , we get α U 1 α X 2 γ > α U . And knowing that 1 α X 2 γ < 1 verifies the inequality.

3. (iii)

For correlated X and U conditioning on X* can increase or decrease the imbalance in U, that is, I m b a l a n c e ( U | X ) > I m b a l a n c e ( U | { } ) or I m b a l a n c e ( U | X ) I m b a l a n c e ( U | { } ) . Using two different restrictions on α U , we show that the difference in absolute imbalances, I m b a l a n c e ( U | { } ) I m b a l a n c e ( U | X ) = α U + α X ρ α U ( 1 ρ ˜ 2 ) + α X ρ ( 1 γ ) 1 ( α X + α U ρ ) 2 γ , can be negative or positive. Using α U = α X ρ with α X ρ > 0 as first restriction results in a negative difference. Since I m b a l a n c e ( U | { } ) = 0 and I m b a l a n c e ( U | X ) = γ ( 1 ρ 2 ) 1 ( α X + α U ρ ) 2 γ α X ρ > 0 we obtain I m b a l a n c e ( U | { } ) I m b a l a n c e ( U | X ) < 0 . Using α U = α X ρ ( 1 γ ) 1 ρ ˜ 2 with α X ρ > 0 as second restriction results in a positive difference. Since I m b a l a n c e ( U | { } ) = γ ( 1 ρ 2 ) 1 ρ 2 γ α X ρ > 0 and I m b a l a n c e ( U | X ) = 0 we get I m b a l a n c e ( U | { } ) I m b a l a n c e ( U | X ) > 0 .

### Proof 3 Bias in the linear regression estimator τ ˆ

Using the same linear setting as in Proof 1, we show that, after conditioning on X*, the bias in the linear regression estimator τ ˆ is given by

O V B ( τ ˆ | X ) = { α U β U ( 1 ρ ˜ 2 ) + ( α X β X + α X ρ β U + α U ρ β X ) ( 1 γ ) } × 1 1 ( α X + α U ρ ) 2 γ .

The estimator τ ˆ for the effect of treatment Z is obtained from regressing Y onto Z and X*: τ ˆ = r Y Z r Y X r Z X 1 r Z X 2 . Plugging the population correlations

r Y Z = C o v ( Y , Z ) = C o v ( β X X + β U U + τ Z + ε Y , α X X + α U U + ε Z ) = τ + α X β X + α U β U + α X β U ρ + α U β X ρ , r Y X = C o v ( Y , X ) / S D ( X ) = C o v ( β X X + β U U + τ Z + ε Y , X + e ) γ = ( β X + τ α X + β U ρ + τ α U ρ ) γ , r Z X = C o v ( Z , X ) / S D ( X ) = C o v ( α X X + α U U + ε Z , X + e ) γ = ( α X + α U ρ ) γ

into the above OVB formula we get τ ˆ τ = r Y Z r Y X r Z X 1 r Z X 2 τ . In setting ρ = 0 or γ = 1 all other bias formulas contained in this article directly follow from this general formula.

### Proof 4 Inequalities for increasing bias when conditioning on an uncorrelated and reliably measured confounder X

For uncorrelated confounders X and U (with standardized coefficients), we prove the inequalities (i) α U β U α X β X > 1 α X 2 α X 2 if sgn ( α X β X ) = sgn ( α U β U ) , (ii) α U β U α X β X > 1 α X 2 2 α X 2 if sgn ( α X β X ) sgn ( α U β U ) and α X β X > α U β U , and (iii) α U β U α X β X > 1 1 α X 2 if sgn ( α X β X ) sgn ( α U β U ) and α X β X < α U β U . Given the biases before and after conditioning on X, O V B ( τ ˆ | { } ) = α X β X + α U β U and O V B ( τ ˆ | X ) = α U β U 1 α X 2 , adjusting for X increases the absolute bias if

(C1) α U β U 1 α X 2 > α X β X + α U β U

First, if sgn ( α X β X ) = sgn ( α U β U ) , (C1) is equivalent to α U β U 1 α X 2 > α X β X + α U β U . In dividing both sides by α U β U we obtain 1 1 α X 2 > α X β X α U β U + 1 and, finally, α U β U α X β X > 1 α X 2 α X 2 .

Second, if sgn ( α X β X ) sgn ( α U β U ) and α X β X > α U β U , then (C1) can be written as α U β U 1 α X 2 > α X β X α U β U . Dividing both sides by α U β U we obtain 1 1 α X 2 > α X β X α U β U 1 , and thus α U β U α X β X > 1 α X 2 2 α X 2 .

Third, if sgn ( α X β X ) sgn ( α U β U ) and α X β X < α U β U , then (C1) is equivalent to α U β U 1 α X 2 > α U β U α X β X . Then, dividing both sides by α U β U we obtain 1 1 α X 2 > 1 α X β X α U β U and finally α U β U α X β X > 1 1 α X 2 . For α X β X < α U β U this inequality is always true because the left-hand side is always greater than one while the right-hand side is always less than one. That is, for sgn ( α X β X ) sgn ( α U β U ) and α X β X < α U β U , conditioning on X always increases rather than reduces the bias.

### Proof 5 Inequalities among absolute biases

It is important to note that measurement error in X* always attenuates OVB towards the initial bias [26], that is,

O V B ( τ ˆ | X ) < O V B ( τ ˆ | X ) < O V B ( τ ˆ | { } ) i f O V B ( τ ˆ | X ) < O V B ( τ ˆ | { } ) a n d O V B ( τ ˆ | X ) > O V B ( τ ˆ | X ) > O V B ( τ ˆ | { } ) i f O V B ( τ ˆ | X ) > O V B ( τ ˆ | { } ) .

Since the initial OVB and the OVB after adjusting for X can be of opposite signs, the two inequalities do not imply that measurement error necessarily increases the absolute OVB. Thus, the corresponding inequalities with absolute OVBs,

O V B ( τ ˆ | X ) < O V B ( τ ˆ | X ) < O V B ( τ ˆ | { } ) i f O V B ( τ ˆ | X ) < O V B ( τ ˆ | { } ) a n d O V B ( τ ˆ | X ) > O V B ( τ ˆ | X ) > O V B ( τ ˆ | { } ) i f O V B ( τ ˆ | X ) > O V B ( τ ˆ | { } )
do not hold in general. They only hold if X and U induce bias in the same direction, that is, all four terms in the initial bias formula have the same sign. To show the impact of measurement error on the bias, we prove the following four inequalities:
1. (i)

O V B ( τ ˆ | X ) O V B ( τ ˆ | X ) O V B ( τ ˆ | { } ) h o l d s i f O V B ( τ ˆ | X ) O V B ( τ ˆ | { } ) 1 a n d sgn O V B ( τ ˆ | { } ) = sgn O V B ( τ ˆ | X ) ,

2. (ii)

O V B ( τ ˆ | { } ) < O V B ( τ ˆ | X ) < O V B ( τ ˆ | X ) h o l d s i f O V B ( τ ˆ | X ) O V B ( τ ˆ | { } ) > 1 a n d sgn O V B ( τ ˆ | { } ) = sgn O V B ( τ ˆ | X ) ,

3. (iii)

O V B ( τ ˆ | X ) O V B ( τ ˆ | X ) O V B ( τ ˆ | { } ) h o l d s i f k < O V B ( τ ˆ | X ) O V B ( τ ˆ | { } ) 1 a n d sgn O V B ( τ ˆ | { } ) sgn O V B ( τ ˆ | X ) ,

4. (iv)

O V B ( τ ˆ | X ) O V B ( τ ˆ | { } ) < O V B ( τ ˆ | X ) h o l d s i f 1 < O V B ( τ ˆ | X ) O V B ( τ ˆ | { } ) k a n d sgn O V B ( τ ˆ | { } ) sgn O V B ( τ ˆ | X ) ,

where k = 1 γ γ 1 ( α X + ρ α U ) 2 and γ is the reliability of X*.

For ease of notation we use a = α X + ρ α U , u = 1 ρ 2 α U β U and i n i = O V B ( τ ˆ | { } ) = α X β X + α U β U + ρ α X β U + ρ α U β X . Then, we can write the absolute OVB differences as

B 0 = O V B ( τ ˆ | X ) O V B ( τ ˆ | { } ) = u 1 a 2 + σ 2 + i n i 1 a 2 + σ 2 σ 2 i n i , B X = O V B ( τ ˆ | X ) O V B ( τ ˆ | X ) = u 1 a 2 + σ 2 + i n i 1 a 2 + σ 2 σ 2 u 1 a 2 .

We first prove that a 2 < 1 . Due to the constraints of our parameters (unit variance of variables) we have α X 2 + α U 2 + 2 ρ α X α U < 1 . Adding ( 1 ρ 2 ) α U 2 to both sides we get 1 ( α X + ρ α X ) 2 > ( 1 ρ 2 ) α U 2 . Since 1 < ρ < 1 we obtain the true inequality ( α X + ρ α U ) 2 < 1 . Consequently, 1 a 2 + σ 2 > 0 in both B 0 and B X .

Now consider the situation where sgn O V B ( τ ˆ | { } ) = sgn O V B ( τ ˆ | X ) holds (inequalities (i) and (ii)). The equality of signs directly implies sgn ( u ) = sgn ( i n i ) such that

B 0 = 1 1 a 2 + σ 2 u 1 a 2 i n i a n d B X = σ 2 1 a 2 + σ 2 u 1 a 2 i n i .

Then, inequality (i) holds if O V B ( τ ˆ | X ) O V B ( τ ˆ | { } ) 1 because B 0 0 and B X 0 . Inequality (ii) holds if O V B ( τ ˆ | X ) O V B ( τ ˆ | { } ) > 1 because B 0 > 0 and B X < 0 .

Now consider the situation where sgn O V B ( τ ˆ | { } ) sgn O V B ( τ ˆ | X ) and u > i n i σ 2 (inequality (iii)). The two absolute OVB differences are given by

B 0 = 1 1 a 2 + σ 2 u 1 a 2 + 2 σ 2 i n i a n d B X = σ 2 1 a 2 + σ 2 u 1 a 2 + i n i .

Then, inequality (iii) holds if O V B ( τ ˆ | X ) O V B ( τ ˆ | { } ) 1 and O V B ( τ ˆ | X ) O V B ( τ ˆ | { } ) > 1 γ γ 1 ( α X + ρ α U ) 2 because B 0 0 and B X 0 . Note that B 0 0 holds because u 1 a 2 + 2 σ 2 i n i u 1 a 2 i n i 0 .

Finally consider the situation where sgn O V B ( τ ˆ | { } ) sgn O V B ( τ ˆ | X ) and u i n i σ 2 (inequality (iv)). The two absolute OVB differences are given by

B 0 = 1 1 a 2 + σ 2 u + 1 a 2 i n i a n d B X = σ 2 1 a 2 + σ 2 i n i u 1 a 2 2 u σ 2 .

Inequality (iv) holds if O V B ( τ ˆ | X ) O V B ( τ ˆ | { } ) > 1 and O V B ( τ ˆ | X ) O V B ( τ ˆ | { } ) 1 γ γ 1 ( α X + ρ α U ) 2 because B 0 0 and B X 0 . Note that B X 0 hold because i n i u 1 a 2 2 u σ 2 i n i u 1 a 2 < 0 .

Published Online: 2016-11-8
Published in Print: 2016-9-1