Consider the causal Markov chain which represents the structural equations: (1) (2)with and being omitted factors such that are mutually independent.
It is well known that, regardless of the functions f and g, this model implies the conditional independence of X and Z given Y, written as (3)This can be readily derived from the independence of , and , and it also follows from the d-separation criterion, since Y blocks all paths between X and Z.
However, the causal chain can also be encoded in the language of counterfactuals by writing (4) (5)where u stands for all omitted factors (in our case ) and stands for the value that Y would take in unit u had X been x. Accordingly, the functional and independence assumptions embedded in the chain model translate into the following counterfactual statements: (6) (7) (8)Equation (6) represents the missing arrow from X to Z, while eqs (7) and (8) convey the mutual independence of , and .1
Assume now that we are given the three counterfactual statements (6)–(8) as a specification of some uncharted model; the question arises: Are these statements testable? In other words, is there a statistical test conducted on the observed variables X, Y, and Z that could prove the model wrong? On the one hand, none of the three defining conditions (6)–(8) is testable in isolation, because each invokes a counterfactual entity. On the other hand, the fact that the chain model of eqs (1) and (2) yields the conditional independence of eq. (3) implies that the combination of all three counterfactual statements should yield a testable implication.
This paper concerns the derivation of testable conditions like eq. (3) from counterfactual sentences like eqs (6)–(8). Whereas graphical models have the benefits of inferential tools such as d-separation [1, 2, p. 335] for deriving their testable implications, counterfactual specifications must resort to the graphoid axioms,2 which, on their own, cannot reduce subscripted expressions like eqs (6)–(8) into a subscript-free expression like eq. (3). To unveil the testable implications of counterfactual specifications, the graphoid axioms must be supplemented with additional inferential machinery.
We will first prove that eq. (3) indeed follows from eqs (6) to (8) and then tackle the general question of deriving testable sentences from any given collection of counterfactual statements of the conditional independence variety. To that end, we will augment the graphoid axioms with three auxiliary inference rules, which will enable us to remove subscripts from variables and, if feasible, derive sentences in which all variables are unsubscripted, that is, testable. These auxiliary rules will rely on the composition axiom [1, p. 229] (9)which was shown to be sound and complete relative to recursive models [6, 7].3 In the special case of the axiom is known as consistency rule: (10)and is discussed by Robins  and Pearl .
2 Deriving testables from non-testables
We first note that substituting eq. (6) into eq. (8) yields (11)which is a universally quantified formula, stating that for all z, y, y′, x, x′ in the respective domains of Z, Y, and X, the following independence condition holds (12)We next note that, for the special case of , eq. (12) yields or, using eq. (10) (13)This can be written succinctly as (14)Our next task is to remove the subscript from . This is done in two steps. First we apply the graphoid rule of “weak union” (i.e., , [1, p. 11]) to obtain (15)Second, we explicate the components of eq. (15) and write (16)for all y, z, x, and y′. Again, for the special case of y′ = y, eq. (10) permits us to remove the subscript from Zy and write (17)Finally, since the last independency holds for all and z, we can write it in succinct notation as which is subscript free and coincides with the testable implication of eq. (3).
To summarize, we have shown that the subscripts in eq. (11) can be removed in two steps. First (18)and second, (19)Moreover, we see that eq. (3) follows from eq. (8) alone and does not require the exogeneity assumption expressed in eq. (7).
3 Augmented graphoid axioms
In this section, we will identify three general rules that, when added to the graphoid axioms, will enable us to derive testable implications without referring back to the consistency axiom of eq. (10). The three rules are as follows:
Rule 1 (20)
Rule 2 (21)
Rule 3 (22)Rules 1 and 2 state that a subscript x can be removed from whenever stands in conjunction with , be it before or after the conditioning bar. In our example we had . Rule 3 states that a subscript x can be removed from whenever appears in the conditioning set. The symbols in eqs (20)–(22) stand for any set of variables, observable as well as counterfactual.
For mnemonic purposes we can summarize these rules using the following shorthand:
Rule 1–2 (23)
Rule 3 (24)
4 Deriving ignorability relations
Unveiling testable implications is only one application of the augmented graphoid axioms in Section 3. Not less important is the ability of these axioms to justify ignorability relations which a researcher may need for deriving causal effect estimands [1, 8, 11, 12].4
Consider the sentence which may be implied by a certain process and assume we wish to estimate the causal effect of Z on Y, from non-experimental data. For this estimation to be unbiased, the conditional ignorability needs to be assumed, where W is some set of observed covariates. Using Axiom (22) we can show that satisfies the ignorability assumptions and, therefore, adjustment for X will yield a bias-free estimate of the causal effect . This can be shown as follows: (using the graphoid rule of “weak union”) and by Rule (22) we obtain We therefore can write (25)Equation (25) is none other but the standard adjustment formula for the causal effect of Z on Y, controlling for X.
The process can also be reversed; we start with a needed, yet unsubstantiated ignorability condition, and we ask whether it can be derived from more fundamental conditions which are either explicit in the model or defensible on scientific grounds. Consider, for example, an unconfounded mediation model in which treatment X is randomized and assume we seek to estimate to effect of the mediator Z on the outcome Y. (The model is depicted in Figure 1.) Operationally, we know that the ignorability condition would allow us to obtain the desired effect by adjusting for X, as shown in the derivation of eq. (25). However, lacking graphs for guidance, it is not clear whether this condition follows from the assumptions embedded in the model; a formal proof is therefore needed. The assumptions explicit in the model take the form
To show that the desired ignorability condition follows from (a), (b) and (c), we can use Rule 3 (eq. 22) as follows. First, the standard graphoid axioms dictate Next, applying Rule 3 twice, together with , gives which yields the desired ignorability condition.
These derivations can be skipped, of course, when we have a graphical model for guidance. The adjustment formula (25) could then be written by inspection, since X satisfies the back-door condition relative to . However, researchers who mistrust graphs and insist on doing the entire analysis by algebraic methods would need to use Rules 1–3 to justify the ignorability condition from assumptions (a), (b) and (c).
Rules 1–3, when added to the graphoid axioms, allow us to process conditional-independence sentences involving counterfactuals and derive both their testable implications and implications that are deemed necessary for identifying causal effects. We conjecture that Rules 1–3 are complete in the sense that all implications derivable from the graphoid axioms together with the consistency rule (18) are also derivable using the graphoid axioms together with Rules 1–3.
Augmented graphoids are by no means a substitute for causal diagrams, since the complexity of finding a derivation using graphoid axioms may be exponentially hard . Diagrams, on the other hand, offer simple graphical criteria (e.g., d-separation or back-door) for deriving testable implications and effect estimands. In reasonably sized problems, these criteria can be verified by inspection, while, in large problems, they can be computed in polynomial time [14, 15]. The secret of diagrams is that they embed all the graphoid axioms in their structure and, in effect, pre-compute all their ramifications and display them in graphical patterns.
Sander Greenland and Jin Tian provided helpful comments on an early version of this note. This research was supported in parts by grants from NIH #1R01 LM009961-01, NSF #IIS-0914211 and #IIS-1018922, and ONR #N000-14-09-1-0665 and #N00014-10-1-0933.
Pearl J. Causality: models, reasoning, and inference, 2nd ed. New York: Cambridge University Press, 2009.
Pearl J. Probabilistic reasoning in intelligent systems. San Mateo, CA: Morgan Kaufmann, 1988.
Dawid A. Conditional independence in statistical theory. J R Stat Soc Ser B 1979;41:1–31.
Spohn W. Stochastic independence, causal independence, and shieldability. J Philos Logic 1980;9:73–99. [Crossref]
Pearl J, Paz A. GRAPHOIDS: a graph-based logic for reasoning about relevance relations. In: Boulay BD, Hogg D, Steels L, editors. Advances in artificial intelligence-II. North-Holland, 1987:357–63.
Galles D, Pearl J. An axiomatic characterization of causal counterfactuals. Found Sci 1998;3:151–82. [Crossref]
Halpern J. Axiomatizing causal reasoning. In: Cooper G, Moral S, editors. Uncertainty in artificial intelligence. San Francisco, CA: Morgan Kaufmann, 1998:202–10. Also, J Artif Intell Res 2000;12:17–37.
Holland P. Statistics and causal inference. J Am Stat Assoc 1986;81:945–60. [Crossref]
Robins J. A new approach to causal inference in mortality studies with a sustained exposure period – applications to control of the healthy workers survivor effect. Math Model 1986;7:1393–512. [Crossref]
Rosenbaum P, Rubin D. The central role of propensity score in observational studies for causal effects. Biometrika 1983;70:41–55. [Crossref]
Rubin D. Estimating causal effects of treatments in randomized and nonrandomized studies. J Educ Psychol 1974;66:688–701. [Crossref]
Geiger D. Graphoids: a qualitative framework for probabilistic inference. PhD thesis, Department of Computer Science, University of California – Los Angeles, Los Angeles, CA, 1990.
Shpitser I, Pearl J. Complete identification methods for the causal hierarchy. J Mach Learn Res 2008;9:1941–79.
Tian J, Paz A, Pearl J. Finding minimal separating sets. Technical report R-254, University of California – Los Angeles, Los Angeles, CA, 1998.
Rules for translating graphical models to counterfactual notation are given in Pearl [1, pp. 232–4], based on the structural semantics of counterfactuals. The rules represent the omitted factors affecting any variable, say Y, by the set of counterfactuals , where stands for the parents of Y in the diagram.↩
The graphoid axioms are axioms of conditional independence, first formulated by Dawid  and Spohn . Their connections to graph connectivity and to other notions of “information relevance” were established by Pearl and Paz  and are described in detail in Pearl [1, pp. 78–133, 2, p. 11].↩
Reliance on the assumptions of conditional ignorability [8, 11, 12], which are cognitively formidable, is one of the major weaknesses of the potential outcome framework [1, pp. 350–1]. Axioms (20)–(22) permit us to derive needed ignorability conditions from other counterfactual statements which are perhaps more transparent.↩
About the article
Published Online: 2014-09-12
Published in Print: 2014-09-01
Rules for translating graphical models to counterfactual notation are given in Pearl [1, pp. 232–4], based on the structural semantics of counterfactuals. The rules represent the omitted factors affecting any variable, say Y, by the set of counterfactuals , where stands for the parents of Y in the diagram.
The graphoid axioms are axioms of conditional independence, first formulated by Dawid  and Spohn . Their connections to graph connectivity and to other notions of “information relevance” were established by Pearl and Paz  and are described in detail in Pearl [1, pp. 78–133, 2, p. 11].
Reliance on the assumptions of conditional ignorability [8, 11, 12], which are cognitively formidable, is one of the major weaknesses of the potential outcome framework [1, pp. 350–1]. Axioms (20)–(22) permit us to derive needed ignorability conditions from other counterfactual statements which are perhaps more transparent.