Skip to content
Licensed Unlicensed Requires Authentication Published by De Gruyter June 4, 2015

Bounding a Linear Causal Effect Using Relative Correlation Restrictions

  • Brian Krauth EMAIL logo


This paper describes and implements a simple partial solution to the most common problem in applied microeconometrics: estimating a linear causal effect with a potentially endogenous explanatory variable and no suitable instrumental variables. Empirical researchers faced with this situation can either assume away the endogeneity or accept that the effect of interest is not identified. This paper describes a middle ground in which the researcher assumes plausible but nontrivial restrictions on the correlation between the variable of interest and relevant unobserved variables relative to the correlation between the variable of interest and observed control variables. Given such relative correlation restrictions, the researcher can then estimate informative bounds on the effect and assess the sensitivity of conventional estimates to plausible deviations from exogeneity. Two empirical applications demonstrate the potential usefulness of this method for both experimental and observational data.

Corresponding author: Brian Krauth, Department of Economics, Simon Fraser University, 8888 University Dr Burnaby BC V5A 1S6, Canada, E-mail:

Appendix Proofs of Propositions

A.1 Proposition 1

Proof: To establish result 1, note that:

λ(bx;m)=corrm(x,ybxx(ybxx)p)corrm(x, (ybxx)p)=covm(x,ybxx(ybxx)p)varm(x)varm(ybxx(ybxx)p)covm(x, (ybxx)p)varm(x)varm((ybxx)p)=(covm(x,y)bxvarm(x)covm(x,yp)bxcovm(x,xp)1)varm(ybxx)varm((ybxx)p)1

We can apply several properties of the best linear predictor, specifically that cov(x, yp)=cov(xp, yp), cov(x, xp)=var(xp) and var(yyp)=var(y)–var(yp), to further derive:

(21)λ(bx;m)=(covm(x,y)bxvarm(x)covm(xp,yp)bxvarm(xp)1)varm(y)2bxcovm(x,y)+bx2varm(x)varm(yp)2bxcovm(xp,yp)+bx2varm(xp)1=(p1p21)p3p41 (21)

where p1, p2, p3, and p4 are all polynomials (and thus differentiable) in bx. They are also differentiable in m. Application of the quotient and product rules implies that λ(bx; m) is differentiable provided that (a) p2≠0, (b) p4≠0, and (c) p3p4>1. Condition (a) fails if and only if:


Since varm(xp)>0 by equation (11), we can solve to get


Condition (b) fails if and only if:


which implies that ypbxxp is constant. Since the covariance of any random variable with a constant is zero, this in turn implies that cov(xp, ypbxxp)=cov(xp, yp)–bxvar(xp)=0. Again we can solve for bx to get:


Condition (c) fails if and only if p3p4, or equivalently:


Note that ypbxxp is the best linear predictor of ybxx, so:


This implies that varm(ybxx–(ypbxxp))=0, which also implies that:


Rearranging, we get:


which implies that y is an exact linear function of (x, c) and equation (9) is violated. Therefore, condition (c) must hold. Since conditions (a), (b), and (c) hold for all bxβx(m),λ(bx;m), is differentiable at all bxβx(m).

To establish result 2, note that varm(x) is strictly positive by (9) and varm(xp) is strictly positive by (11). Therefore:


So by L’Hospital’s rule:


By the same reasoning:


So by two applications of L’Hospital’s rule:


Result 2 can then be derived by substitution, and the argument repeated for limbx.

To prove result 3 I first show how the behavior of λ(bx; m) near βx(m) depends on some special cases:

  1. Suppose that m implies an exact linear relationship between yp and xp, i.e.

    (22)Em((ypambmxp)2)=0 (22)

    for some am and bm. Then equation (14) is satisfied for all λ when bx=βx(m)=bm.

    Proof: To show that βx(m)=bm:


    To show that equation (14) is satisfied at βx(m) for all λ, note that cβc (bx; m)=ypbxxp. This implies that:


    and by the same argument covm(x,cβc(βx(m);m))=0. Equation (14) thus reduces to 0=λ0, a condition that is satisfied by any λ.

  2. Suppose that m implies:

    (23)covm(y,x)varm(x)=covm(yp,xp)varm(xp) (23)

    Then equation (14) is satisfied for all λ when bx=βx(m).

    Proof: First, note that in this case:




    Equation (14) thus reduces to 0=λ0, which is satisfied for all λ.

  3. Suppose that neither (22) nor (23) hold. Then for any λ∈(–∞, λ(m))∪(λ(m), ∞) there is a bx such that λ(bx; m)=λ, i.e., that solves equation (14).

    Proof: First, note that since covm(xp,yp)βx(m)varm(xp)=0, the existence of a solution to equation (14) when bx=βx(m) requires that either varm(ypβx(m)xp)=0, implying (22) holds, or covm(x,y)βx(m)varm(x)=0, implying (23) holds. Since neither holds, there is no λ that satisfies equation (14) for bx=βx(m).

    Next I characterize the behavior of λ(bx; m) near βx(m). Since varm(xp)>0, p2 is positive for bx<βx(m), negative for bx>βx(m), and zero when bx=βx(m). Also note that covm(x,y)βx(m)varm(x)=covm(x,y)covm(xp,yp)varm(xp)varm(x), so p1 is strictly positive for all bxβx(m) if covm(x,y)varm(x)>covm(xp,yp)varm(xp), and strictly negative for all bxβx(m) if covm(x,y)varm(x)<covm(xp,yp)varm(xp). This implies that:

    limbxβx(m)λ(bx;m)={if covm(y,x)varm(x)>covm(yp,xp)varm(xp)if covm(y,x)varm(x)<covm(yp,xp)varm(xp)


    limbxβx(m)λ(bx;m)={if covm(y,x)varm(x)>covm(yp,xp)varm(xp)if covm(y,x)varm(x)<covm(yp,xp)varm(xp)

    I have thus established that limbxλ(bx;m)=λ(m), that limbxβxλ(bx;m) is either –∞ or ∞, and that λ(bx; m) is continuous on (,βx(m)). Suppose for the moment that limbxβx(m)λ(bx;m)=. By the intermediate value theorem, for any λ∈(–∞(m)), there exists some bx(,βx(m)) such that λ(bx; m)=λ. This is a sufficient condition for bx to solve equation (14). Since limbxβx=, then limbxβx(m)λ(bx;m)=. Again, since λ(bx; m) is continuous on (βx(m),), the intermediate value theorem implies that for any λ∈(λ(m), ∞) there exists some bx(βx(m),) such that λ(bx; m)=λ. Therefore, for any λ∈(–∞, λ(m))∪(λ(m), ∞) there is a bx such that λ(bx; m)=λ, i.e., that solves equation (14). The same argument can be duplicated for the case limbxβxλ(bx)=. Note that there may or may not be a bx such that λ(bx; m)=λ(m).

To prove result 4, pick any bx and consider two cases. First, suppose that bx=βx(m). Then bxB˜(Λ;m) since λ(bx; m) does not exist. Next, suppose that bxβx(m). Then λ(bx; m) exists (by result 1 of this proposition) and provides the unique λ that solves equation (14) for that λ. Therefore,

bxB˜(Λ;m)if and only if bxBx(Λ;m) and bxβx(m)

which is another way of stating the result.        □

A.2 Proposition 2

Proof: Since Λ is nonempty, λ(m0)∉Λ implies that Λ must contain some λλ(m0). Result 3 of Proposition 1 says that there exists some bx such that (λ, bx) satisfy equation (14). Therefore, the identified set is nonempty.

Since Λ is closed, λ(m0)∉Λ implies that there is some ε>0 such that (λ(m0)–ε, λ(m0)+ε) is disjoint from Λ. Result 2 of Proposition 1 says that limbxλ(bx;m0)=limbxλ(bx;m0)=λ(m0). This means that given such an ε, there is some finite Bε such that Bε>βx(m0) and:

|bx|>Bελ(bx;m0)(λ(m0)ε,λ(m0)+ε)  (by result 2 of Proposition1)λ(bx;m0)Λ  (since(λ(m0)ε,λ(m0)+ε)is disjoint from Λ)bxB˜(Λ,m0)  (by definition of B˜)bxB˜(Λ,m0){βx(m0)}  (since Bε>βx(m0))bxBx(Λ,m0)  (by result 4 of Proposition 1)

Therefore, the identified set is bounded.        □

A.3 Proposition 3

Proof: Both βx(m) and λ(m) are continuous in m by the quotient rule, given that varm(xp)>0. Result 1 of Proposition 1 says that λ(bx; m) is continuous in m for all bxβx(m)). So the first set of results follows from the straightforward application of Slutsky’s theorem.

For the second result, note that the implicit function theorem implies that βxL(Λ;m) is continuously differentiable in m if dλ(bx;m)dbx|bx=βxL(Λ;m)0. In that case, consistency of b^xL(Λ) follows from Slutsky’s theorem. The same argument applies to b^xH(Λ).

For the third result, note that if Bx(Λ; m0)=ℝ, then result 2 of Proposition 1 implies λ(m0) is in the interior of Λ. Therefore, there exists an ε>0 and B1<B such that [λ(B1; m0)–ε, λ(B1; m0)+ε]⊂Λ. Since λ^(B1)pλ(B1;m0):


The same argument applies to b^xH(Λ), with a change of sign.        □

A.4 Proposition 4

Proof: Both βxL(Λ;m) and βxH(Λ;m) are differentiable in m under these conditions, so the result follows from direct application of the delta method, where:

(24)A=[mβxL(Λ;m)|m=m0mβxH(Λ;m)|m=m0] (24)

The expression for A given in the proposition comes from applying the implicit function theorem:

(25)mβxL(Λ;m)=mλ(bx;m)λ(bx;m)/bx|bx=βxL(Λ;m)mβxH(Λ;m)=mλ(bx;m)λ(bx;m)/bx|bx=βxH(Λ;m) (25)

and substituting. While mathematically unnecessary, this substitution is important computationally. Derivatives of λ(bx; m) – a closed form function with closed form derivatives – can be calculated much more accurately than derivatives of βxL(Λ;m) – an implicit function that must be approximated by iterative methods.        □

A.5 Proposition 5

Proof: If var(xp)=0, then cov(x, ypbxxp)=0 for all bx. This implies that (14) holds if and only if cov(x, ybxx)=0, i.e., if bx=cov(x, y)/var(x).        □

A.6 Proposition 6

Proof: First, rewrite:


The numerator of λ(bx;m^n) is:


while the denominator is


In a given finite sample, q2(bx;m^n) will be nonzero with probability one if x or any of c is continuously distributed, and probability approaching one as n→∞ (WPA1) otherwise. So λ(bx;m^n) will exist even though λ(bx; m0) does not. Let βxOLS(m) be the value of bx that implies q1(bx; m)=0, or equivalently:


Note that βxOLS(m^n) is just the coefficient on x from the OLS regression of y on x and c, and that:

(26)βxOLS(m^n)pβxOLS(m0)=cov(xxp,yyp)var(xxp)=cov(x,y)var(x)=βx (26)

Since q1(βxOLS(m^n))=0 by construction and q2(βxOLS(m^n))0 WPA1:



(27)β^L(Λ)βxOLS(m^n)β^H(Λ)WPA1 (27)

Pick any ε>0. The event (|βxOLS(m^n)βx|<ε) clearly implies (βxOLS(m^n)>βxε), which itself implies (β^H(Λ)>βxε) by equation (27). Therefore:


By (26), Pr(|βxOLS(m^n)βx|<ε)1, so by the sandwich theorem:

(28)Pr(β^H(Λ)>βxε)1 (28)

Let λmax satisfy |λ|≤λmax for all λ∈Λ. Then λ∈Λ implies |λ|≤λmax. Therefore:

(29)0Pr(β^H(Λ)βx+ε)=Pr(λ(bx;m^n)Λfor some bx>βx+ε)Pr(|λ(bx;m^n)|λmaxfor some bxβx+ε) (29)

Now, for any δ≠0



(30)Pr(|λ(bx;m^n)|λmaxfor some bxβx+ε)0 (30)

By the sandwich theorem (29) and (30) imply Pr(β^H(Λ)βx+ε)0, or equivalently that:

(31)Pr(β^H(Λ)<βx+ε)1 (31)

Taking (28) and (31) together produces:

(32)Pr(|β^H(Λ)βx|<ε)1 (32)

which is the result stated in the proposition. The same argument applies to βxL.        □


Altonji, J. G., T. E. Elder, and C. R. Taber. 2005a. “An Evaluation of Instrumental Variable Strategies for Estimating the Effects of Catholic Schooling.” Journal of Human Resources 40: 791–821.10.3368/jhr.XL.4.791Search in Google Scholar

Altonji, J. G., T. E. Elder, and C. R. Taber. 2005b. “Selection on Observed and Unobserved Variables: Assessing the Effectiveness of Catholic Schools.” Journal of Political Economy 113: 151–184.10.1086/426036Search in Google Scholar

Bleakley, H. 2010a. “Data and Code for ‘Malaria Eradication in the Americas: A Retrospective Analysis of Childhood Exposure’.” Accessed March 7, 2012. doi://10.1257/app.2.2.1.Search in Google Scholar

Bleakley, H. 2010b. “Malaria Eradication in the Americas: A Retrospective Analysis of Childhood Exposure.” American Economic Journal: Applied Economics 2: 1–45.10.1257/app.2.2.1Search in Google Scholar PubMed PubMed Central

Conley, T. G., C. B. Hansen, and P. E. Rossi. 2012. “Plausibly Exogenous.” Review of Economics and Statistics 94: 260–272.10.1162/REST_a_00139Search in Google Scholar

Finn, J. D., J. Boyd-Zaharias, R. M. Fish, and S. B. Gerber. 2007. “Project Star and Beyond: Database User’s Guide.” Data Set and Documentation, HEROS, Inc.,, July 15, 2007.Search in Google Scholar

Hanushek, E. A. 1986. “The Economics of Schooling: Production and Efficency in Public Schools.” Journal of Economic Literature 24: 1141–1177.Search in Google Scholar

Imbens, G. W. 2003. “Sensitivity to Exogeneity Assumptions in Program Evaluation.” American Economic Review 93: 126–132.10.1257/000282803321946921Search in Google Scholar

Imbens, G. W., and C. F. Manski. 2004. “Confidence Intervals for Partially Identified Parameters.” Econometrica 72: 1845–1857.10.1111/j.1468-0262.2004.00555.xSearch in Google Scholar

Klepper, S., and E. E. Leamer. 1984. “Consistent Sets of Estimates for Regressions with Errors in all Variables.” Econometrica 52: 163–184.10.2307/1911466Search in Google Scholar

Kraay, A. 2012. “Instrumental Variables Regressions with Uncertain Exclusion Restrictions: A Bayesian Approach.” Journal of Applied Econometrics 27: 108–128.10.1002/jae.1148Search in Google Scholar

Krauth, B. V. 2007. “Peer Effects and Selection Effects on Youth Smoking in California.” Journal of Business and Economic Statistics 25: 288–298.10.1198/073500106000000396Search in Google Scholar

Kreider, B. 2010. “Regression Coefficient Identification Decay in the Presence of Infrequent Classification Errors.” Review of Economics and Statistics 92: 1017–1023.10.1162/REST_a_00044Search in Google Scholar

Kreider, B., and S. C. Hill. 2009. “Partially Identifying Treatment Effects with an Application to Covering the Uninsured.” Journal of Human Resources 44: 409–449.10.1353/jhr.2009.0012Search in Google Scholar

Krueger, A. B. 1999. “Experimental Estimates of Education Production Functions.” Quarterly Journal of Economics 114: 497–532.10.1162/003355399556052Search in Google Scholar

Leamer, E. E. 1978. Specification Searches: Ad Hoc Inference with Non Experimental Data. New York, NY, USA: John Wiley and Sons.Search in Google Scholar

Lewbel, A. 2012. “Using Heteroskedasticity to Identify and Estimate Mismeasured and Endogenous Regressor Models.” Journal of Business and Economic Statistics 30: 67–80.10.1080/07350015.2012.643126Search in Google Scholar

Manski, C. F. 1994. Identification Problems in the Social Sciences. Cambridge, MA, USA: Harvard University Press.Search in Google Scholar

Manski, C. F. 2003. Partial Identification of Probability Distributions. New York, NY, USA: Springer-Verlag.Search in Google Scholar

Nevo, A., and A. M. Rosen. 2012. “Identification with Imperfect Instruments.” Review of Economics and Statistics 94: 659–671.10.1162/REST_a_00171Search in Google Scholar

Oster, E. 2014. “Unobservable Selection and Coefficient Stability: Theory and Evidence.” Chicago, IL, USA: Working Paper, University of Chicago.10.3386/w19054Search in Google Scholar

Rosenbaum, P. R. 2002. Observational Studies. 2nd ed. New York, NY, USA: Springer.10.1007/978-1-4757-3692-2Search in Google Scholar

Schanzenbach, D. W. 2006. “What have Researchers Learned from Project STAR?” Brookings Papers on Education Policy 9: 205–228.10.1353/pep.2007.0007Search in Google Scholar

Stoye, J. 2009. “More on Confidence Intervals for Partially Identified Parameters.” Econometrica 77: 1299–1315.10.3982/ECTA7347Search in Google Scholar

Article note:

This research has benefited from comments by seminar audiences at Duke, Guelph, McMaster, Toronto, Virginia, Waterloo, York, and the Federal Trade Commission, as well as the Editor and two anonymous referees. All errors are mine. Author contact information: e-mail: The author acknowledges support from the Social Sciences and Humanities Research Council of Canada. Stata code is available at

Supplemental Material:

The online version of this article (DOI: 10.1515/jem-2013-0013) offers supplementary material, available to authorized users.

Published Online: 2015-6-4
Published in Print: 2016-1-1

©2016 by De Gruyter

Downloaded on 23.2.2024 from
Scroll to top button