Jump to ContentJump to Main Navigation
Show Summary Details
More options …

Journal of Causal Inference

Ed. by Imai, Kosuke / Pearl, Judea / Petersen, Maya Liv / Sekhon, Jasjeet / van der Laan, Mark J.

2 Issues per year

See all formats and pricing
More options …

Testing for the Unconfoundedness Assumption Using an Instrumental Assumption

Xavier de Luna
  • Corresponding author
  • Department of Statistics, Umeå School of Business and Economics, Umeå University, SE-90187 Umeå, Sweden
  • Email
  • Other articles by this author:
  • De Gruyter OnlineGoogle Scholar
/ Per Johansson
  • Department of Economics, Uppsala University, Uppsala, Sweden
  • Institute for Evaluation of Labour Market and Education Policy, Uppsala, Sweden
  • Email
  • Other articles by this author:
  • De Gruyter OnlineGoogle Scholar
Published Online: 2014-04-15 | DOI: https://doi.org/10.1515/jci-2013-0011


The identification of average causal effects of a treatment in observational studies is typically based either on the unconfoundedness assumption (exogeneity of the treatment) or on the availability of an instrument. When available, instruments may also be used to test for the unconfoundedness assumption. In this paper, we present a set of assumptions on an instrumental variable which allows us to test for the unconfoundedness assumption, although they do not necessarily yield nonparametric identification of an average causal effect. We propose a test for the unconfoundedness assumption based on the instrumental assumptions introduced and give conditions under which the test has power. We perform a simulation study and apply the results to a case study where the interest lies in evaluating the effect of job practice on employment.

Keywords: average treatment effect; job practice; nonparametric identification

1 Introduction

Identification of the causal effect of a treatment T on an outcome Y in observational studies is typically based either on the unconfoundedness assumption (also called selection on observables, exogeneity, ignorability, see, e.g. de Luna and Johansson [1], Imbens and Wooldridge [2], Pearl [3]) or on the availability of an instrument. The unconfoundedness assumption says loosely that all the variables affecting both the treatment T and the outcome Y are observed (we call them covariates) and can be controlled for. An instrument is usually defined as a variable affecting the treatment T, and such that it is related to the outcome Y only through T (and possibly the observed covariates). When available, instruments can be used to identify causal effects in parametric situations. Nonparametric identification is also possible with the help of instruments, and Angrist et al. [4] develop a theory for the nonparametric identification and estimation of local average causal effects. Abadie [5] and Frölich [6] extended these results to the situation where the observed covariates are related to the instrument. Note also that nonparametric identification can be obtained with the related concept of (fuzzy) regression discontinuity designs; see Hahn et al. [7], Battistin and Retore [8], Dias et al. [9] and Lee [10, Sec. 5.5.3]. When a causal effect is identified, a test of the unconfoundedness assumption may be devised by comparing the estimates of the causal effects obtained both under the unconfoundedness assumption and using the instrument (classical Durbin–Wu–Hausman (DWH) test in a parametric setting). This was recently used by Donald et al. [11] to propose a test of the unconfoundedness assumption in a nonparametric framework.

In this paper, we introduce general instrumental conditions under which it is possible to test for the unconfoundedness or exogeneity assumption. The instrumental assumptions are general and, for instance, they do not necessarily yield identification of a causal effect when the unconfoundedness assumption does not hold. Indeed, to obtain the nonparametric identification of a local average causal effects stronger (and untestable) assumptions must be made on the instrument, see, e.g. Imbens and Angrist [12], Angrist et al. [4], Angrist and Fernandez-Val [13], Donald et al. [11] and Guo et al. [14]. In particular, these papers use a monotonicity assumption saying that the instrument must affect the treatment in a monotone fashion, as well as do not allow for unobserved heterogeneity to affect both the instrument and the treatment. Based on our general instrumental conditions we can propose a statistic to test the unconfoundedness assumption. The proposed test is related to the use of two control groups to test the unconfoundedness assumption, an idea previously used, e.g. in Rosenbaum [15], de Luna and Johansson [1] and Dias et al. [9]. Rosenbaum [15] was probably first to formalize the idea that two control groups provide information on the unconfoundedness assumption and described actual observational studies where different control groups were available. One of our contributions in this context is the introduction of general assumptions under which an observed variable can be used to split an available control group in order to test the unconfoundedness assumption nonparametrically. However, the test statistic we eventually propose does not actually require the split to be done.

In Section 4, we present a motivating example where Swedish register data are used to study the causal effect of job practice (JP) on employment. We have access to a rich set of background characteristics on unemployed individuals, although the question remains whether the effect of JP on employment is confounded by unobserved heterogeneity. In this study, unemployed have access to JP through their participation into a labor market program. During 1998 there were two such labor market programs available in Sweden offering JP with different probabilities. Because we know that the two programs differ mainly only with respect to their propensity to offer JP, the participation into the two programs may be assumed to affect employment differently only through JP. We, thus, argue that program participation fulfills our instrumental conditions. In contrast with usual instrumental assumptions this allows potential unobserved heterogeneity in the program and JP assignment to be correlated. We apply the introduced test to check whether the estimated effect of JP on employment is biased due to unobservables affecting both JP and employment.

Before treating this motivating example in more details in Section 4, Section 2 presents the model, introduces instrumental assumptions and develops the theoretical results which then allow us to introduce a test of the unconfoundedness assumption. Section 3 presents a simulation study of the finite sample properties of the proposed test. In particular, one of the designs used illustrates the situation where the monotonicity assumption mentioned above does not hold. The paper is concluded in Section 5.

2 Theory and method

2.1 Model

We use the Neyman–Rubin model [16, 17] for causal inference when the interest lies in the causal effect of a binary treatment T, taking values in T={0,1}, on an outcome. Let us thus define Y(t), tT, called potential outcomes. The latter are interpreted as the outcomes resulting from the assignment T=t, tT, respectively. We then observe Y=TY(1)+(T1)Y(0). Let us also assume that we observe a set of variables which are not affected by the treatment assignment. We will need to distinguish in particular X and Z two vectors of such variables, the latter of dimension one.

For tT, we consider (X,Z,T,Y(t)) as a random vector variable with a given joint distribution, from which a random sample is drawn. Population parameters that are often of interest in this context are the average causal effect θ=E(Y(1)Y(0)) and the average causal effect on the treated θt=E(Y(1)Y(0)|T=1) or on the non-treated θnt=E(Y(1)Y(0)|T=0).

In observational studies, where the treatment assignment T is not randomized, an identifying assumption (e.g. Rosenbaum and Rubin [18]; Imbens [19]) for the average causal effect is the following.

(A.1) For tT, TY(t)|X(unconfoundedness),Pr(T=t|X)>0(commonsupport).

The common support assumption can be investigated by looking at the data. The unconfoundedness assumption may be considered as realistic in situations where the set of characteristics X is rich enough, and when there is subject-matter theory to support the assumption.

2.2 Instrumental assumptions, test and power

Let us now consider situations where the variable Z takes values in T (if not, it may be made dichotomous using a threshold) and fulfills the following assumption.

(A.2) For tT, ZY(t)|X,0<Pr(Z=1|X)<1.

Assumption (A.2) prohibits (a) a direct effect from Z to Y(t), i.e. an effect not going through T and (b) unobserved variables affecting both Z and Y(t). On the other hand, (A.2) allows unobserved variables to affect both Z and T which is typically prohibited by usual instrumental assumptions [46]. Note that when assuming (A.2) in the sequel, Z and Y(t) may also be independent conditional on a subset of X, and, e.g. Z may be randomized as discussed after Proposition 1. We also need the following regularity condition.

(A.3) If (A.1) and (A.2) hold, then TY(t)|Z,X, for tT respectively.

Assumption (A.3) is a regularity condition and is violated only in specific situations, of which Example 1 is typical.

Graph illustrating model (1) in Example 1
Figure 1

Graph illustrating model (1) in Example 1

Example 1 Let us assume that the vector (Z,T,Y(0),U,V) has joint normal distribution, where U and V are two unobserved covariates and the set of observed covariates X is empty. Assume now that the following model generates the data: Z=ψ0+ψ1U+ψ2V+εZ, T=ν0+ν1V+εT,(1) Y(0)=ξ1Z+ξ2U+εY.

where U,V,εZ,εTandεY are jointly normal and independently distributed. Let Z=I(Z>0) and T=I(T>0), where I() is the indicator function. Figure 1 gives a graphical representation of the model, where εZ,εTandεY are omitted. We can write the conditional expectations E(Y(0)|Z,U)=ξ1Z+ξ2U, E(U|Z)=γZ,where γ is function of the parameters in (1).

In Example 1, (A.1) and (A.2) will typically be violated, unless we assume that ξ1=ξ2γ, in which case ZY(0) by joint normality, and thereby ZY(0) and TY(0). The constrained parametrization ξ1=ξ2γ yields thus an example where (A.3) is violated since (A.1) and (A.2) hold while one can check that TY(0)|Z does not necessary hold.

This type of example is called unstable [3, Sec. 2.4] in the sense that (A.1 and A.2) will cease to hold as soon as the parameter values do not fulfill the constraint ξ1=ξ2γ. Using directed acyclic graphs,1 it can be shown that assumption (A.3) holds as soon as the distribution is stable, where, e.g. a distribution P(ψ) parametrized with a parameter vector ψ is said stable if no independence can be destroyed by varying the parameter ψ; see Pearl [3, Sec. 2.4] for a formal general definition. Note here that (A.3) does not imply any parametrized functional form.

Proposition 1 Assume (A.1)–(A.3), then ZY(t)|T,X,tT.(2)

Proof. By assumption (A.1) and (A.2) hold. Then, for tT, (A.1)and(A.2)TY(t)|Z,XandZY(t)|X (T,Z)Y(t)|X ZY(t)|T,X.

The first implication by assumption (A.3), the two other by the properties of conditional independence relations, see Dawid [21], Lauritzen [22, Sec. 3.1] and Pearl [3, Sec. 1.1.5]. ■

The conditional independence statement obtained in Proposition 1 is testable from the data when conditioning on T=t (see next section). Finding evidence in the data against (2) yields evidence against the assumptions of the proposition. Thus, evidence against (2) can be interpreted as evidence against the unconfoundedness assumption (A.1) if (A.2) is known to hold from subject-matter considerations – (A.3) being a regularity condition. One application is a random experiment (where Z is a random assignment to a treatment) with restricted compliance T [4, 12]. Another example of application is treated in detail in Section 4. Note that while identification of the causal effect of T on Y may follow from (A.2) with linear models, see, e.g. Pearl [3, p. 248], this is not true in general, and stronger assumptions are needed to obtain nonparametric identification of a causal effect such as, e.g. a local average treatment effect [46]. In particular, our result does not rely on two assumptions typically made to obtain such identification; that the instrument must affect the treatment in a monotone fashion and that no unobserved heterogeneity is allowed to affect both the instrument and the treatment.

For a test based on (2) to have power against (A.1) we further need to have that Z and T are dependent conditional on X. This is typically assumed for instrumental variables to be useful for identification. Examples of situations (expressed with directed acyclic graphs; see Footnote 1) for a test that would be based on (2) to have power against (A.1) are given in Figure 2, panels (a)–(c), while panel (d) shows a case where such a test would not have power. A caveat here is that (2) can be tested only when conditioning on T=t. This has no practical consequence if the test rejects this null hypothesis. On the other hand, in cases where (2) is not rejected for T=t, we have no information on whether it is violated for T=1t. In independent and related work, Guo et al. [14, eqs (3) and (4)] give an example where (2) holds for T=t although not for T=1t, and yet a specific causal effect is identified without the help of Z when the earlier mentioned monotonicity assumption holds.

Four directed acyclic graphs together with a respective stable joint distribution for the variables included: Only cases (a)–(c) are such that a test based on (2) may have power, i.e. if (A.1) does not hold, e.g. through the introduction of a variable V with arrows pointing toward T and Y(t)$$Y(t)$$, then Y(t)╨Z|T,X$$Y(t) \Perp Z|T,{\bf X}$$ would not hold either
Figure 2

Four directed acyclic graphs together with a respective stable joint distribution for the variables included: Only cases (a)–(c) are such that a test based on (2) may have power, i.e. if (A.1) does not hold, e.g. through the introduction of a variable V with arrows pointing toward T and Y(t), then Y(t)Z|T,X would not hold either

2.3 Method

Different strategies may be adopted to test two null hypotheses given by Proposition 1, i.e. H0a:ZY(0)|T=0,X, H0b:ZY(1)|T=1,X.

Note that for θt, (A.1)–(A.3) need to hold only for t=0 and, thus, only H0a is to be tested. Similarly, H0b is relevant when θnt is of interest, while both null hypotheses are relevant for θ. In this paper we propose a testing strategy2 based on the fact that under H0a and H0b we have δ0(X)=0 and δ1(X)=0, for all X, respectively, where δ0(X)=E(Y|T=0,X,Z=1)E(Y|T=0,X,Z=0), δ1(X)=E(Y|T=1,X,Z=1)E(Y|T=1,X,Z=0).

Given a random sample of n individuals indexed by i, i=1,,n, we consider a nonparametric estimator for δj=E(δj(Xi)), j=0,1, δˆj=1Nj1i:Ti=j,Zi=1(YiYˆji)+1Nj0i:Ti=j,Zi=0(Y˜jiYi),

where Njk=card({i:Ti=j,Zi=k}), k=0,1, with card(A) denoting the cardinality of the set A, and Yˆji and Y˜ji are nonparametric estimators of E(Yi|Ti=j,Xi,Zi=0) and E(Yi|Ti=j,Xi,Zi=1), respectively. The two latter estimates may be obtained by nearest neighbor matching, or any other smoothing technique. Since δ0=0 and δ1=0, respectively, under H0a and H0b, the test statistics C0=δˆ0s0andC1=δˆ1s1(3)

will then, under the necessary regularity conditions, be asymptotically normally distributed with mean zero and variance one, where sj is the standard error of δˆj, for j=0,1. For instance, if nearest neighbor matching estimators are used, then the asymptotic theory and in particular sj can be found in Abadie and Imbens [23]. A subsampling estimator of sj is also available in this case in de Luna et al. [24]. As noted above, when θ is of interest, then both hypotheses H0a and H0b are relevant and higher power may be obtained by considering the joint statistic C=C02+C12,

which is asymptotically χ22 distributed, since C0 and C1 are independent.

We should note here that the statistics above are testing conditional mean independence, which is relevant when average causal effects are targeted. Alternatively, one may wish to use tests of conditional independence statements based on all the moments of the underlying distribution [25], thereby making the methods relevant when quantile or distributional causal effects are of interest.

3 Monte Carlo study

We use a Monte Carlo study to investigate the finite sample properties (empirical size and power) of the test C0 in (3), where K-nearest neighbor matching is used as nonparametric estimator of Yˆi(0) and Y˜i(0), together with the Abadie and Imbens [23] variance estimator. As noted above, in situations where θ is of interest and (A.1)–(A.3) are assumed to hold for t=0,1 instead for only t=0, then C could be used instead of C0 thereby increasing the power of the test. As a benchmark we also implement a parametric DWH test, where we first regress T on X and Z and then add the residuals from this fit as a covariate into the outcome equation for Y. The test for the unconfoundedness assumption is then a Wald test on the parameter for the included residual covariate (see, e.g. Wooldridge [26, Chap. 6], and Rivers and Vuong [27]). We use a robust covariance matrix [28].

3.1 Design

We consider a data generating process (DGP) which mimics a situation with a randomized assignment to a treatment (Z) with non-perfect compliance (δ0=0 below), where T denotes the actual treatment assignment, as well as more general situations where the effect of Z on T is allowed to be confounded by unobservables. For unit i, let Zi=Iδ0U0i+εZi>0, Ti=IXi+δ1U0i+0.5+δ2U1iZi+U2i+εTi>0,

and Yi=1+Xi+θiTi+δ3U2i+εYi

or Yi=I1+Xi+θiTi+δ3U2i+εYi>0.

We let εYi, εZi, εTi(0), εTi(1), U0i, U1i and U2i be independently distributed as N(0,0.25). Moreover, we also let XiN(0,2), and consider two cases for θi: θi=1 (homogeneous treatment effect) and θi=1+Xi (heterogeneous treatment effect). Parameters are varied in the study in order to study the empirical size and power of the test C0. Five designs, denoted D.1–D.5, are considered and described in Table 1. For the situation where we set δ2=8 (Design D.2), the instrumental variable Z is non-monotone, i.e. there exists individuals j for which Tj(Zj=0)=1 and Tj(Zj=1)=0 (called defiers), where Tj(Zj=k), kT, are potential treatment values for individual j when switching Zj to (everything else equal) k equal 0 or 1. The proportion of defiers when δ2=8 is 8.4%. Thus, for design D.2 the monotonicity assumption necessary for the nonparametric identification of the local average causal effect is violated [46]. Another assumption for identification made in the latter references is that δ0×δ1=0, and, hence, the instrument does not recover identification in designs D.3 and D.5.

Table 1

Different designs considered with resulting instrumental property for Z and whether nonparametric identification of the (local) average causal effect holds

The two tests mentioned above –C0 and DWH – are evaluated in testing the null hypothesis δ3=0, and empirical size and power of the tests are obtained by letting δ3{0,0.1,0.2,0.3,0.6,0.9,1.5,2}. K-nearest neighbor matching estimators with K=1,3,5 and 7 are used to compute C0, and we restrict X to have common support when conditioning on Z=1 and Z=0. We consider sample sizes N= 500, 1,500 and 3,000. In the continuous response cases, DWH should have correct size when θi=1 irrespective of whether the instrument is monotone or not, or whether the relation with T is confounded or not. DWH is also expected to have correct size [27] in the binary response case with homogeneous causal effect (θi=1). In contrast, DWH is expected to breakdown in all heterogeneous cases (θi=1+Xi), since the response model is then misspecified. Up to our knowledge, no nonparametric test has previously been proposed in the literature for situations in Table 1 where an average causal effect is not nonparametrically identified. On the other hand, using C0 is expected to give correct size and has power in all situations simulated.

3.2 Results

The results from the Monte Carlo simulations are displayed in Figures 3 and 4. The empirical sizes are also displayed in Table 2. The nonparametric test C0 with K=5 behaves well with all the DGPs considered, with empirical size close to 5% and power increasing with sample size. Results with other values for K can be obtained from the authors. Empirical sizes were comparable for all K values considered, while power increased with K: significantly so from K=1 to K=3 and only marginally from K=5 to 7. Power is further increased when using C instead of C0 (see Table 3 for design D.1; similar increase was obtained for the other designs) as expected since the former is based on stronger assumptions. On the other hand, the DWH test has too large empirical size in the heterogeneous cases (θi=1+Xi). In the homogeneous treatment setup (θi=1) DWH behaves well with respect to its empirical size. This was expected as noted in the previous section, thereby yielding an interesting benchmark. In such homogeneous cases, the nonparametric test C0 has similar or better power than DWH, except for Designs D.1, where DWH is based on correctly specified models. For Design D.2 (non-monotone instrument), C0 has markedly higher power than DWH.

Table 2

Empirical sizes (nominal size is 5%) obtained with the nonparametric test C0 (K=5) and the DWH test for simulated DGPs with a continuous response

Table 3

Empirical size (nominal size is 5%) and power obtained with test statistics C0 and C (both with K=5) for simulated DGP D.1 with θi=1+Xi; sample size 3,000

In summary, the results obtained show that the nonparametric test (3) performs well in situations where DWH is consistent. By making fewer assumptions, (3) is also shown to work with non-monotone instruments and instruments whose effect on the treatment is confounded by unobservables, i.e. in situations where a local average causal effect is not identified.

Empirical size (δ3=0$${\delta _3} = 0$$) and power for the nonparametric test C0$${C_0}$$ and the DW(H) test (based on robust covariance matrix) for Design D.1 (first row) and Design D.2 (second row), homogeneous causal effect (first column) and heterogeneous causal effect (second column). Designs are described in Table 1
Figure 3

Empirical size (δ3=0) and power for the nonparametric test C0 and the DW(H) test (based on robust covariance matrix) for Design D.1 (first row) and Design D.2 (second row), homogeneous causal effect (first column) and heterogeneous causal effect (second column). Designs are described in Table 1

Empirical size (δ3=0$${\delta _3} = 0$$) and power for the nonparametric test C0$${C_0}$$ and the DW(H) test [27] for Design D.3–D.5 (rows 1–3), homogeneous causal effect (first column) and heterogeneous causal effect (second column). Designs are described in Table 1
Figure 4

Empirical size (δ3=0) and power for the nonparametric test C0 and the DW(H) test [27] for Design D.3–D.5 (rows 1–3), homogeneous causal effect (first column) and heterogeneous causal effect (second column). Designs are described in Table 1

4 Effect of JP

We consider a case study where the interest lies in estimating the effect of JP for unemployed on employment status. JP was offered within two separate labor market training (LMT) programs in Sweden during 1998. One program was run by the regular program provider in Sweden; the Swedish National Labor Market Board (AMV). The other program was offered by the Federation of Swedish Industries (Swit). To be eligible to the programs the unemployed individuals had to be at least 20 years of age and enrolled at the public employment service. There was no difference in benefits for the two groups of trainees. The fundamental idea with the Swit program was to increase the contacts between the unemployed individuals and employers by providing JP. In a survey conducted in June 2000 on 1,000 program participants from both programs, 69.5% of the Swit participants and 52% of the AMV participants stated that they obtained access to JP.3 Except for the idea to provide more contacts with employers the two programs were similar. Both programs tested the individual’s motivation and ability before recruitment by similar selection procedures (see Johansson [30], for a thorough description of the selection). The types of courses given within the Swit and the AMV programs are displayed in Table 4. The similarities of the two programs are apparent. Thus, despite differences in procurement between the two organizations (Swit and AMV), there do not seem to be any large differences between the types of LMT courses offered nor with the selection of participants. The fact that the programs distinguish themselves only with respect to JP availability prompts us that the effect of LMT program choice on labor market outcome should differ only through the effect of JP. This suggests that LMT program choice has the property (A.2) of an instrument for JP.

Table 4

The frequency distribution of the courses within the two programs

Based on the survey one can see in Table 5 that there is a statistical significant 18.1 percentage points difference in employment six months after leaving the program (the two programs have same average length) when comparing individuals having JP with those without. In the table we have some individual background variables: (i) education, (ii) work handicap (see disabled), (iii) gender (1 if man and 0 if women) and (iv) immigration status (1 if immigrant 0 else). Finally, since the propensity of receiving JP are higher in larger labor markets with also better labor market opportunities we need to control for region of residence in the estimation of an effect of JP. Sweden was divided into four regions: Stockholm, Skåne, Västra Götaland and the rest of the country. Stockholm, Skåne and Västra Götaland are the three regions with the largest population. Note that we have good reasons to assume that the two LMT programs only differ in their JP prospects, thus if the labor market opportunities affect the access to the LMT programs this does not invalidate them being used as an instrument for JP.

Table 5

Descriptive statistics for outcome employment and background characteristics and how they differ between JP and non-JP individuals

We can see some average differences between the two samples. Those with JP are (i) less disabled and (ii) less likely to live in Stockholm. The level of education also differs: they have on average more compulsory and upper secondary education but also less college education than those with no JP. Based on these average differences, it is difficult to argue that those with JP have better labor market prospects than those without JP. The single factor suggesting the JP population has better labor opportunities without JP is that they are less likely disabled. In order to further study the selection into JP we used the covariates from the table and estimated a logistic regression model (a propensity score) including merely main effects. The results from this estimation (not displayed) are that individuals who are from Stockholm or Västra Götaland, and disabled, are less likely to receive JP. There is no statistical significant (5% level) differences in education between the two groups for instance. Figure 5, left panel, displays the propensity scores estimated. The latter gives evidence for the common support assumption in (A.1). In order to investigate the related assumption 0<Pr(Z=1|X)<1 included in (A.2), we also fit the probability of getting into Swit versus AMV with a logistic regression including main effects, and Figure 5, right panel, also provides evidence for the latter assumption.

Because there are 969 JP (treated) individuals for only 528 non-treated individuals an estimate of the average causal effect of JP on the treated (ACT), θt, will typically suffer from severe bias due to difficulties in finding matches to the treated. Thus, we estimate instead the average causal effect of JP on the non-treated (ACNT), θnt. A reasonable assumption is that individuals with higher than average return from JP are the ones who select themselves into JP. This means that ACNT yields a lower bound for ACT, θtθnt.

Assumption (A.1) need only to be fulfilled for t=1 in order for us to estimate ACNT, i.e. Y(1)T|X, where the covariates are displayed in Table 5. A K=5 nearest neighbor matching estimator using the minimum Mahalanobis distance between the covariates of Table 5 is used to estimate the parameter θnt, yielding θˆnt=12% points, with standard error [23, Theorems 6 and 7] estimated to 5% points. Hence, there is a significant effect from JP.

The distribution (percent) of the estimated probabilities (as function of the covariates) of (not) having JP (T, left panel) and of getting into the two alternative LMT programs (Z, right panel)
Figure 5

The distribution (percent) of the estimated probabilities (as function of the covariates) of (not) having JP (T, left panel) and of getting into the two alternative LMT programs (Z, right panel)

4.1 Testing the unconfoundedness assumption

We test for the null hypothesis H0b using C1 in (3). Nonparametric estimation is performed with K=5 nearest neighbor matching on the covariates displayed in Table 5 using the minimum Mahalanobis distance, also for computing the standard deviation s1; see Abadie and Imbens [23]. The resulting value for test statistic is 1.31. Hence, we cannot reject the unconfoundedness assumption (p-value of 0.18). We also perform a DWH test by estimating a linear probability model with the discrete covariates displayed in Table 5, yielding a p-value of 0.09. Thus, given the maintained assumption (A.2), none of the test can reject the null hypothesis, at the 5% level, that the effect of JP on employment is not confounded, although the DWH test by making stronger assumptions has a p-value under 10%.

5 Conclusions

Identification of the causal effect of a treatment on an outcome in observational studies is typically based either on the unconfoundedness assumption or on the availability of an instrument (e.g. Angrist et al. [4]). In this paper, by introducing general instrumental assumptions we are able to propose an easy to use nonparametric test for the unconfoundedness assumption in situations where the same assumptions do not allow for the nonparametric identification of a causal effect. We illustrate the framework introduced with a study of the effect of JP for unemployed on employment, where we argue that an instrument fulfilling our conditions is available through the existence of two LMT programs with different degree of accessibility to JP.

In many applications, nonparametric identification of causal effects using instruments is non-trivial, e.g. when a non-testable monotonicity property for the instrument must hold [46] and/or when a large set of control variables is needed for the instrument to be valid. Using our weaker instrumental conditions, one may test for the unconfoundedness assumption. If the latter is not rejected, this gives some ground to the analyst to proceed using an identification strategy based on the unconfoundedness assumption. We have operationalized the theoretical results with a test statistic based on K-nearest neighbor matching estimators. Other nonparametric regression estimators may be used instead, such as, e.g. local polynomial regression and splines. Finally, it is worth noting here that for durations outcomes with censored data, the test proposed herein may be implemented by making use of the matching estimators for censored duration responses presented in Fredriksson and Johansson [31] and de Luna and Johansson [32].


This paper has benefited from useful comments from Martin Huber, Ingeborg Waernbaum, an editor, an anonymous referee and seminar participants at John Hopkins, Maryland University and the third Joint IZA/IFAU Conference on Labor Market Policy Evaluation. De Luna acknowledges the financial support of the Swedish Research Council through the Swedish Initiative for Research on Microdata in the Social and Medical Sciences (SIMSAM), the Ageing and Living Condition Program and grant 70246501. Johansson acknowledges the financial support of the Swedish Council for Working Life and Social Research (grant 2004–2005).


  • 1.

    de Luna X, Johansson P. Exogeneity in structural equation models. J Econometrics 2006;132:527–43. CrossrefGoogle Scholar

  • 2.

    Imbens GW, Wooldridge JM. Recent developments in the econometrics of program evaluation. J Econ Lit 2009;47:5–86. CrossrefWeb of ScienceGoogle Scholar

  • 3.

    Pearl J. Causality, 2nd ed. Cambridge: Cambridge University Press, 2009. Google Scholar

  • 4.

    Angrist D, Imbens G, Rubin D. Identification of treatment effects using instrumental variables. J Am Stat Assoc 1996;91:444–55. CrossrefGoogle Scholar

  • 5.

    Abadie A. Semiparametric instrumental variable estimation of treatment response models. J Econometrics 2003;113:231–63. CrossrefWeb of ScienceGoogle Scholar

  • 6.

    Frölich M. Nonparametric iv estimation of local average treatment effects with covariates. J Econometrics 2007;139:35–75. CrossrefGoogle Scholar

  • 7.

    Hahn J, Todd P, van der Klaaw W, Todd W, Van der Klaauw P. Identification and estimation of treatment effects with a regression-discontinuity design. Econometrica 2001;69:201–9. CrossrefGoogle Scholar

  • 8.

    Battistin E, Retore E. Ineligibles and eligible non-participants as a double comparison group in regression discontinuity designs. J Econometrics 2008;142:715–30. CrossrefWeb of ScienceGoogle Scholar

  • 9.

    Dias M, Ichimura H, van den Berg G. The matching method for treatment evaluation with selective participation and ineligibles. IFAU Working Papers, 2008:6, Institute for Labour Market Policy Evaluation, Uppsala, 2008. Google Scholar

  • 10.

    Lee M-J. Micro-econometrics for policy, program and treatment effects. Oxford: Oxford University Press, 2005. Google Scholar

  • 11.

    Donald SG, Hsuz Y-C, Lieli RP. Testing the unconfoundedness assumption via inverse probability weighted estimators of (l)att. Working Paper, 2011. Web of ScienceGoogle Scholar

  • 12.

    Imbens GW, Angrist JD. Identification and estimation of local average treatment effects. Econometrica 1994;62:467–75. CrossrefWeb of ScienceGoogle Scholar

  • 13.

    Angrist J, Fernandez-Val I. ExtrapoLATE-ing: external validity and overidentification in the late framework. NBER Working Paper, 16566, National Bureau of Economic Research, Cambridge, MA, 2010. Google Scholar

  • 14.

    Guo Z, Cheng J, Lorch S, Small D. Using an instrumental variable to test for unmeasured confounding. Working Papers, 2013. Google Scholar

  • 15.

    Rosenbaum PR. The role of a second control group in an observational study (with discussion). Stat Sci 1987;2:292–316. CrossrefGoogle Scholar

  • 16.

    Neyman J. Sur les applications de la théorie des probabilités aux experiences agricoles: essai des principes. Roczniki Nauk Rolniczych X 1923:1–51. In Polish, English translation by D. Dabrowska and T. Speed in Stat Sci 1990;5:465–72. Google Scholar

  • 17.

    Rubin DB. Estimating causal effects of treatments in randomized and nonrandomized studies. J Educ Psychol 1974;66:688–701. CrossrefGoogle Scholar

  • 18.

    Imbens GW. Nonparametric estimation of average treatment effects under exogeneity: a review. Rev Econ Stat 2004;86:4–29. CrossrefGoogle Scholar

  • 19.

    Rosenbaum PR, Rubin DB. The central role of the propensity score in observational studies for causal effects. Biometrika 1983;70:41–55. CrossrefGoogle Scholar

  • 20.

    de Luna X, Waernbaum I, Richardson T. Covariate selection for the non-parametric estimation of an average treatment effect. Biometrika 2011;98:861–75. CrossrefWeb of ScienceGoogle Scholar

  • 21.

    Dawid AP. Conditional independence in statistical theory. J R Stat Soc Ser B 1979;41:1–31. Google Scholar

  • 22.

    Lauritzen S. Graphical models. Oxford: Oxford University Press, 1996. Google Scholar

  • 23.

    Abadie A, Imbens GW. Large sample properties of matching estimators for average treatment effects. Econometrica 2006;74:235–67. CrossrefGoogle Scholar

  • 24.

    de Luna X, Johansson P, Sjöstedt-de Luna S. Bootstrap inference for k-nearest neighbour matching estimators. IZA Discussion Papers 5361, Institute for the Study of Labor, Bonn, 2010. Google Scholar

  • 25.

    Su L, White H. A consistent characteristic function-based test for conditional independence. J Econometrics 2007;141:807–34. Web of ScienceCrossrefGoogle Scholar

  • 26.

    Wooldridge J. Econometric analysis of cross section and panel data. Cambridge: MIT Press, 2002. Google Scholar

  • 27.

    Rivers D, Vuong H. Limited information estimators and exogeneity tests for simultaneous probit models. J Econometrics 1988;39:347–66. CrossrefGoogle Scholar

  • 28.

    White H. Maximum likelihood estimation of misspecified models. Econometrica 1982;50:1–25. Google Scholar

  • 29.

    Johansson P, Martinson S. Det nationella it-programmet – en slutrapport om swit. Forskningsrapporter, 2000:8, Institute for Labour Market Policy Evaluation, Uppsala, 2000. Google Scholar

  • 30.

    Johansson P. The importance of employer contacts: evidence based on selection on observables and internal replication. Labour Econ 2008;15:350–69. CrossrefWeb of ScienceGoogle Scholar

  • 31.

    Fredriksson P, Johansson P. Dynamic treatment assignment – the consequences for evaluations using observational data. J Bus Econ Stat 2008;26:435–45.Web of ScienceCrossrefGoogle Scholar

  • 32.

    de Luna X, Johansson P. Non-parametric inference for the effect of a treatment on survival times with application in the health and social sciences. J Stat Plann Inference 2010;140:2122–37. CrossrefGoogle Scholar


  • 1

    Directed acyclic graphs, e.g. Figure 1, together with a stable (also called faithful) distribution for the variables are used to describe conditional independence relations between variables; see Lauritzen [22] for a general account on graphical models and de Luna et al. [20] for their use together with potential outcomes.

  • 2

    One related strategy could be to use the concept of two independent control groups [15]. Under H0a we can use Z to obtain two independent control groups (one defined by Z=1 and one by Z=0) for estimating θ, yielding θˆz=0 and θˆz=1, respectively. Under H0a the difference θˆz=0θˆz=1 has expectation zero and this makes the basis for a test statistic. However, since we need to compute two nonparametric estimators of θ, the resulting statistic has poor finite sample properties, for instance, when the covariates have different support in the two control groups created. This has been confirmed in simulation experiments not presented here.

  • 3

    A detailed description of the survey can be found in Johansson and Martinson [29]. The survey contained a total of 19 questions. These concerned (i) the individual’s background, (ii) the individual’s labor market training and (iii) the individual’s present labor market situation.


    About the article

    Published Online: 2014-04-15

    Published in Print: 2014-09-01

    Citation Information: Journal of Causal Inference, Volume 2, Issue 2, Pages 187–199, ISSN (Online) 2193-3685, ISSN (Print) 2193-3677, DOI: https://doi.org/10.1515/jci-2013-0011.

    Export Citation

    ©2014 by De Gruyter.Get Permission

    Citing Articles

    Here you can find all Crossref-listed publications in which this article is cited. If you would like to receive automatic email messages as soon as this article is cited in other publications, simply activate the “Citation Alert” on the top of this page.

    Eva Deuchert and Martin Huber
    Oxford Bulletin of Economics and Statistics, 2017, Volume 79, Number 3, Page 411

    Comments (0)

    Please log in or register to comment.
    Log in