A Combinatorial Solution to Causal Compatibility

,


Introduction
A theory of causation specifies the effects of actions with absolute necessity.On the other hand, a probabilistic theory encodes degrees of belief and makes predictions based on limited information.A common fallacy is to interpret correlation as causation; opening an umbrella has never caused it to rain, although the two are strongly correlated.Numerous paradoxical and catastrophic consequences are unavoidable when probabilistic theories and theories of causation are confused.Nonetheless, Reichenbach's principle asserts that correlations must admit causal explanation; after all, the fear of getting wet causes one to open an umbrella.
In recent decades, a concerted effort has been put into developing a formal theory for probabilistic causation [43,54].Integral to this formalism is the concept of a causal structure.A causal structure is a directed acyclic graph, or DAG, which encodes hypotheses about the causal relationships among a set of random variables.A causal model is a causal structure when equipped with an explicit description of the parameters which govern the causal relationships.Given a multivariate probability distribution for a set of variables and a proposed causal structure, the causal compatibility problem aims to determine the existence or non-existence of a causal model for the given causal structure which can explain the correlations exhibited by the variables.More generally, the objective of causal discovery is to enumerate all causal structure(s) compatible with an observed distribution.Perhaps unsurprisingly, causal inference has applications in a variety of academic disciplines including economics, risk analysis, epidemiology, bioinformatics, and machine learning [29,42,43,48,62].
For physicists, a consideration of causal influence is commonplace; the theory of special/general relativity strictly prohibits causal influences between space-like separated regions of space-time [57].Famously, in response to Einstein, Podolsky, and Rosen's [19] critique on the completeness of quantum theory, Bell [7] derived an observational constraint, known as Bell's inequality, which must be satisfied by all hidden variable models which respect the causal hypothesis of relativity.Moreover, Bell demonstrated the existence of quantum-realizable correlations which violate Bell's inequality [7].Recently, it has been appreciated that Bell's theorem can be understood as an instance of causal inference [61].Contemporary quantum foundations maintains two closely related causal inference research programs.The first is to develop a theory of quantum causal models in order to facilitate a causal description of quantum theory and to better understand the limitations of quantum resources [3,6,13,17,25,30,36,38,44,47,60].The second is the continued study of classical causal inference with the purpose of distinguishing genuinely quantum behaviors from those which admit classical explanations [1,2,11,23,24,25,50,58,60].In particular, the results of [30] suggest that causal structures which support quantum non-classicality are uncommon and typically large in size; therefore, systematically finding new examples of such causal structures will require the development of new algorithmic strategies.As a consequence, quantum foundations research has relied upon, and contributed to, the techniques and tools used within the field of causal inference [13,30,50,60].The results of this paper are concerned exclusively with the latter research program of classical causal inference, but does not rule out the possibility of a generalization to quantum causal inference.
When all variables in a probabilistic system are observed, checking the compatibility status between a joint distribution and a causal structure is relatively easy; compatibility holds if and only if all conditional independence constraints implied by graphical d-separation relations hold [39,43].Unfortunately, in more realistic situations there are ethical, economic, or fundamental barriers preventing access to certain statistically relevant variables, and it becomes necessary to hypothesize the existence of latent/hidden variables in order to adequately explain the correlations expressed by the visible/observed variables [22,43,60].In the presence of latent variables, and in the absence of interventional data, the causal compatibility problem, and by extension the subject of causal inference as a whole, becomes considerably more difficult.
In order to overcome these difficulties, numerous simplifications have be invoked by various authors in order to make partial progress.A particularly popular simplification strategy has been to consider alternative classes of graphical causal models which can act as surrogates for DAG causal models; e.g.MC-graphs [34], summary graphs [59], or maximal ancestral graphs (MAGs) [46,63].While these approaches are certainly attractive from a practical perspective (efficient algorithms such as FCI [54] or RFCI [16] exist for assessing causal compatibility with MAGs, for instance), they nevertheless fail to fully capture all constraints implied by DAG causal models with latent variables [21]. 1 The forthcoming formalism is concerned with assessing the causal compatibility of DAG causal structures directly, therefore avoiding these shortcomings.
Nevertheless, when considering DAG causal structures directly (henceforth just causal structures), making assumptions about the nature of the latent variables and the parameters which govern them can simplify the problem [28,53,56].For instance, when the latent variables are assumed to have a known and finite cardinality2 , it becomes possible to articulate the causal compatibility problem as a finite system of polynomial equality and inequality equations with a finite list of unknowns for which non-linear quantifier elimination methods, such as Cylindrical Algebraic Decomposition [31], can provide a complete solution.Unfortunately, these techniques are only computationally tractable in the simplest of situations.Other techniques from algebraic geometry have been used in simple scenarios to approach the causal compatibility problem as well [27,28,35].When no assumptions about the nature of the latent variables are made, there are a plethora of methods for deriving novel equality [22,45] and inequality [2,4,8,11,20,23,26,30,55,58,60] constraints that must be satisfied by any compatible distribution.The majority of these methods are unsatisfactory on the basis that the derived constraints are necessary, but not sufficient.A notable exception is the Inflation Technique [60], which produces a hierarchy of linear programs (solvable using efficient algorithms [9,18,32,33,51]) which are necessary and sufficient [37] for determining compatibility.
In contrast with the aforementioned algebraic techniques, the purpose of this paper is to present the possible worlds framework, which offers a combinatorial solution to the causal compatibility problem in the presence of latent variables.Importantly, this framework can only be applied when the cardinality of the visible variables are known to be finite. 3This framework is inspired by the twin networks of Pearl [43], parallel worlds of Shpitser [52], and by some original drafts of the Inflation Technique paper [60].The possible worlds framework accomplishes three things.First, we prove its conceptual advantages by revealing that a number of disparate instances of causal incompatibility become unified under the same premise.Second, we provide a closed-form algorithm for completely solving the possibilistic causal compatibility problem.To demonstrate the utility of this method, we provide a solution to an unsolved problem originally reported [21].Third, we show that the possible worlds framework provides a hierarchy of tests, much like the Inflation Technique, which solves completely the probabilistic causal compatibility problem.
Unfortunately, the computational complexity of the proposed probabilistic solution is prohibitively large in many practical situations.Therefore, the contributions of this work are primarily conceptual.Nevertheless, it is possible that these complexity issues are intrinsic to the problem being considered.Notably, the hierarchy of tests presented here has an asymptotic rate of convergence commensurate to the only other complete solution to the probabilistic compatibility problem, namely the hierarchy of tests provided in [37].Moreover, unlike the Inflation Technique, if a distribution is compatible with a causal structure, then the hierarchy of tests provided here has the advantage of returning a causal model which generates that distribution.
This paper is organized as follows: Section 2 begins with a review of the mathematical formalism behind causal modeling, including a formal definition of the causal compatibility problem, and also introduces the notations to be used throughout the paper.Afterwards, Section 3 introduces the possible worlds framework and defines its central object of study: a possible worlds diagram.Section 4 applies the possible worlds framework to prove possibilistic incompatibility between several distributions and corresponding causal structures, culminating in an algorithm for exactly solving the possibilistic causal compatibility problem.Finally, Section 5 establishes a hierarchy of tests which completely solve the probabilistic causal compatibility problem.Moreover, Section 5.1 articulates how to utilize internal symmetries in order to alleviate the aforementioned computational complexity issues.Section 6 concludes.
Appendix A summarizes relevant results from [21] needed in Section 2. Appendix B generalizes the results of [50], placing new upper bounds on the maximum cardinality of the latent variables, required for Sections 2 and 5.

A Review of Causal Modeling
This review section is segmented into three portions.First, Section 2.1 defines directed graphs and their properties.Second, Section 2.2 introduces the notation and terminology regarding probability distributions to be used throughout the remainder of this article.Finally, Section 2.3 defines the notion of a causal model and formally introduces the causal compatibility problem.

Directed Graphs
connected by directed edges.For a given vertex q, pa G (q) denotes its parents and ch G (q) its children.If there is a directed path from q to u then q is an ancestor of u and u is a descendant of q; the set of all ancestors of q is denoted an G (q) and the set of all descendants is denoted des G (q).The definition for parents, children, ancestors and descendants of a single vertex q are applied disjunctively to sets of vertices Q ⊆ Q: A directed graph is acyclic if there is no directed path of length k > 1 from q back to q for any q ∈ Q and cyclic otherwise.For example, Figure 1 depicts the difference between cyclic and acyclic directed graphs.
i.e. the graph obtained by taking all edges from E which connect members of W.

Probability Theory
Definition 3 (Probability Theory).A probability space is a triple (Ω, Ξ, P) where the state space Ω is the set of all possible outcomes, Ξ ⊆ 2 Ω is the set of events forming a σ-algebra over Ω, and P is a σ-additive function from events to probabilities such that P(Ω) = 1.
Definition 4 (Probability Notation).For a collection of random variables X I = {X 1 , X 2 , . . ., X k } indexed by i ∈ I = {1, 2, . . ., k} where each X i takes values from Ω i , a joint distribution P I = P 12...k assigns probabilities to outcomes from Ω I = i∈I Ω i .The event that each X i takes value x i , referred to as a valuation of X I4 , is denoted as, A point distribution P I (y I ) = 1 for a particular event y I ∈ Ω I is expressed using square brackets, The set of all probability distributions over Ω I is denoted as

Causal Models and Causal Compatibility
A causal model represents a complete description of the causal mechanisms underlying a probabilistic process.Formally, a causal model is a pair of objects (G, P), which will be defined in turn.First, G is a directed acyclic graph (Q, E), whose vertices q ∈ Q represent random variables structure is to graphically encode the causal relationships between the variables.Explicitly, if q → u ∈ E is an edge of the causal structure, X q is said to have causal influence on X u5 .Consequently, the causal structure predicts that given complete knowledge of a valuation of the parental variables X pa G (u) = X q | q ∈ pa G (u) , the random variable X u should become independent of its non-descendants6 [43].With this observation as motivation, the causal parameters P of a causal model are a family of conditional probability distributions P q|pa G (q) for each q ∈ Q.In the case that q has no parents in G, the distribution is simply unconditioned.The purpose of the causal parameters are to predict a joint distribution P Q on the configurations Ω Q of a causal structure, If the hypotheses encoded within a causal structure G are correct, then the observed distribution over Ω Q should factorize according to Equation 6.Unfortunately, as discussed in Section 1, there are often ethical, economic, or fundamental obstacles preventing access to all variables of a system.In such cases, it is customary to partition the vertices of causal structure into two disjoint sets; the visible (observed) vertices V, and the latent (unobserved) vertices L (for example, see Figure 2).Additionally, we denote visible parents of any vertex q ∈ V ∪L as vpa G (q) = V ∩pa G (q) and analogously for the latent parents lpa G (q) = L ∩ pa G (q).In the presence of latent variables, Equation 6 stills makes a prediction about the joint distribution P V∪L (x V , λ L )7 over the visible and latent variables, albeit an experimenter attempting to verify or discredit a causal hypothesis only has access to the marginal distribution P If Ω L is discrete, A natural question arises; in the absence of information about the latent variables L, how can one determine whether or not their causal hypotheses are correct?The principle purpose of this paper is to provide the reader with methods for answering this question.
In general, other than being a directed acyclic graph, there are no restrictions placed on a causal structure with latent variables.Nonetheless, [21] demonstrates that every causal structure G can be converted into a standard form that is observationally equivalent to G where the latent variables are exogenous (have no parents) and whose children sets are isomorphic to the facets of a simplicial complex over V8 .Appendix A summarizes the relevant results from [21] necessary for making this claim.Additionally, Appendix B demonstrates that any finite distribution P V which satisfies the causal hypotheses (i.e.Equation 7) can be generated using deterministic causal parameters for the visible variables and moreover, the cardinalities of the latent variables can be assumed finite 9 .Altogether, Appendices A and B suggest that without loss of generality, we can simplify the causal compatibility problem as follows: are deterministic functions for the visible variables V in G, and are finite probability distributions for the latent variables L in G.A functional causal model defines a probability distribution Definition 6 (The Causal Compatibility Problem).Given a causal structure G = (V ∪ L, E) and a distribution P V over the visible variables V, the causal compatibility problem is to determine if there exists a functional causal model (G, F V , P L ) (defined in Definition 5) such that Equation 11 reproduces P V .If such a functional causal model exists, then P V is said to be compatible with G; otherwise P V is incompatible with G.The set of all compatible distributions on V for a causal structure G is denoted M V (G).

The Possible Worlds Framework
Consider the causal structure in Figure 3a denoted G 3a .For the sake of concreteness, suppose one is promised the latent variables are sampled from a binary sample space, i.e. k µ = k ν = 2. Let z µ = P µ (0 µ ) and z ν = P ν (0 ν ).The causal hypothesis G 3a predicts (via Equation 11) that observable events (x a , x b , x c ) ∈ Ω a × Ω b × Ω c will be distributed according to, where For each distinct realization (λ µ , λ ν ) ∈ Ω µ ×Ω ν of the latent variables, one can consider a possible world wherein the values λ µ , λ ν are not sampled according to the respective distributions P µ , P ν , but instead take on definite values.From the perspective of counterfactual reasoning, each world is modelling a distinct counterfactual assignment of the latent variables, but not the visible variables. 10In this particular example, there are k µ × k ν = 2 × 2 = 4 distinct, possible worlds.Figure 3b represents, and uniquely colors, these possible worlds.Note that the definite valuations of the latent variables in Figure 3b are depicted using squares 11 .Critically, regardless of the deterministic functional relationships f a , f b , f c , there are identifiable consistency constraints that must hold between these worlds.For example, a is determined by a function f a : Ω µ → Ω a and thus the observed value for a in the yellow (0 µ 0 ν )-world must be exactly the same as the observed value for a in the green (0 µ 1 ν )-world.This crossworld consistency constraint is illustrated in Figure 3c by embedding each possible world into a larger diagram with overlapping λ µ → a subgraphs.It is important to remark that not all cross-world consistency constraints are captured by this diagram; the value of b in the yellow (0 µ 0 ν )-world must match the value of b in the orange (1 µ 0 ν )-world if the value of a in both possible worlds is the same.For comparison, in the original causal structure G 3a , the vertices represented random variables sampled from distributions associated with causal parameters; whereas in the possible worlds diagram of Figure 3c, every valuation, including the latent valuations are predetermined by the functional dependences f a , f b , f c .For example, Figure 3d populates Figure 3c with the observable events generated by the following functional dependences, The utility of Figure 3d is in its simultaneous accounts of Equation 14, the causal structure G 3a and the cross-world consistency constraints that G 3a induces.Nonetheless, Figure 3d fails to specify the probabilities z µ , z ν associated with the latent events.In Section 4, we utilize diagrams analogous to Figure 3d to tackle the causal compatibility problem.Before doing so, this paper needs to formally define the possible worlds framework.
Definition 7 (The Possible Worlds Framework).Let G = (V ∪ L, E), be a causal structure with visible variables V and latent variables L. Let F V be a set of functional parameters for V defined exactly as in Equation 9.The possible worlds diagram for the pair (G, F V ) is a directed acyclic graph D satisfying the following properties: 1. (Valuation Vertices) Each vertex in D consists of three pieces (consult Figure 4 for clarity): (a) a subscript q ∈ V ∪ L corresponding to a vertex in G (indicated inside a small circle in the bottom-right corner), (b) an integer ω corresponding to a possible valuation/outcome ω q of q where ω q ∈ {0 q , 1 q , . ..} = Ω q (indicated inside the square of each vertex), (c) and a decoration in the form of colored outlines 12 indicating which worlds (defined below) the vertex is a member of13 .
2. (Ancestral Isomorphism) 14 For every valuation vertex ω q in D, the ancestral subgraph of ω q in D is isomorphic to the ancestral subgraph of q in G under the     map ω q → q.
sub D (an D (ω q )) sub G (an G (q)) (15) 3. (Consistency) Each valuation vertex x v of a visible variable v ∈ V is consistent with the output of the functional parameter f v ∈ F V when applied to the valuation vertices pa D (x v ), 4. (Uniqueness) For each latent variable ∈ L, and for every valuation λ ∈ Ω there exists a unique valuation vertex in D corresponding to λ .Unlike latent valuation vertices, the valuations of visible variables x v ∈ Ω v may be repeated (or absent) from D depending on the form of F V .In such cases, duplicated x v 's are always uniquely distinguished by world membership (colored outline).

(Worlds)
A world is a subgraph of D that is isomorphic to G under the map ω q → q.Let wor(λ L ) ⊆ D denote the world containing the valuation λ L ∈ Ω L15 .Furthermore, for any subset V ⊆ V of visible variables, let obs V (λ L ) ∈ Ω V denote the observed event supported by wor(λ L ).
6. (Completeness) For every valuation of the latent variables λ L ∈ Ω L , there exists a subgraph corresponding to wor(λ L ). 16It is important to remark that although a possible worlds diagram D can be constructed from the pair (G, F V ), the two mathematical objects are not equivalent; the functional parameters F V can contain superfluous information that never appears in D. We return to this subtle but crucial observation in Section 5.1.
The essential purpose of the possible worlds construction is as a diagrammatic tool for calculating the observational predictions of a functional causal model.Lemma 1 captures this essence.
), let D be the possible worlds diagram for (G, F V ).The causal compatibility criterion (Equation 11) for G is equivalent to a probabilistic sum over worlds in D: The remainder of this paper explores the consequences of adopting the possible worlds framework as a method for tackling the causal compatibility problem.

A Complete Possibilistic Solution
Section 3 introduced the possible worlds framework as a technique for calculating the observable predictions of a functional causal model by means of Lemma 1.In this section, we use the possible worlds framework to develop a combinatorial algorithm for completely solving the possibilistic causal compatibility problem.
Definition 8. Given a probability distribution P V : Ω V → [0, 1], its support σ(P V ) is defined as the subset of events which are possible, An observed distribution P V is said to be possibilistically compatible with G if there exists a functional causal model (G, F V , P L ) for which Equation 11 produces a distribution with the same support as P V .The possibilistic variant of the causal compatibility problem is naturally related to the probabilistic causal compatibility problem defined in Definition 6; if a distribution is possibilistically incompatible with G, then it is also probabilistically incompatible.We now proceed to apply the possible worlds framework to prove possibilistic incompatibility between a number of distribution/causal structure pairs.

The Instrumental Structure
The causal structure G 7 depicted in Figure 7 is known as the Instrumental Scenario [8,40,41].For G 7 , Equation 11 takes the form, The following family of distributions, are possibilistically incompatible with G 7 .The Instrumental scenario G 7 is different from G 5 in that there are no observable conditional independence constraints which can prove the possibilistic incompatibility of P abc .Instead, the possibilistic incompatibility of P (23) abc is traditionally witnessed by an Instrumental inequality originally derived in [41], Independently of Equation 24, we now prove possibilistic incompatibility of P abc with G 7 using the possible worlds framework.
Proof.Proof by contradiction; assume that a functional model F V = {f a , f b , f c } for G 7 exists such that Equation 22produces P (23) abc (Equation 23).Analogously to the proof in Section 4.1, there are only two distinct valuations of the joint variables abc, namely 0 a 0 b 0 c and 1 a 0 b 1 c .Therefore, define two worlds one where obs abc (0 µ 0 ν ) = 0 a 0 b 0 c and another where obs abc (1 µ 1 ν ) = 1 a 0 b 1 c .Using these two worlds, a possible worlds diagram can be initialized as in Figure 8a where wor(0 µ 0 ν ) is colored yellow and wor(1 µ 1 ν ) is colored orange.In order to complete the possible worlds diagram of Figure 8a, one first needs to specify how b behaves in two possible worlds: wor(0 µ 1 ν ) colored green and wor(1 µ 0 ν ) colored violet.

The Bell Structure
Consider the causal structure G 9 depicted in Figure 9 known as the Bell structure [7].From the perspective of causal inference, Bell's theorem [7] states that any distribution compatible with G 9 must satisfy an inequality constraint known as a Bell inequality.For example, the inequality due to Clauser, Horne, Shimony and Holt, referred to as the CHSH inequality, constrains correlations held between a and b as x, y vary [15] 19 , Correlations measured by quantum theory are capable of violating this inequality up to S = 2 √ 2 [14].This violation is not maximum; it is possible to achieve a violation of S = 4 using Popescu-Rohrlich box correlations [49].The following distribution is an example of a Popescu-Rohrlich box correlation, Unlike G 7 , there are conditional independence constraints placed on correlations compatible with G 9 , namely the no-signaling constraints P a|xy = P a|x and P b|xy = P b|y .Because P xaby satisfies the no-signaling constraints, the incompatibility of P xaby with G 9 is traditionally proven using Equation 28.We now proceed to prove its incompatibility using the possible worlds framework.
Proof.Proof by contradiction; assume that a functional causal model F V = {f a , f b , f x , f y } for G 9 exists which supports P (29) xaby and use the possible worlds framework.Unlike the previous proofs, we only need to consider a subset of the events in P (29) xaby to initialize a possible worlds diagram.Consider the following pair of events and associated latent valuations which support them 20 , Using Equation 30, initialize the possible worlds diagram in Figure 10 with worlds wor(0 µ 0 ρ 0 ν ) colored green and wor(1 µ 1 ρ 1 ν ) colored violet.An unavoidable contradiction arises when attempting to populate the values for f a (0 x 1 ρ ) in the yellow world wor(0 µ 1 ρ 1 ν ) and f b (0 y 1 ρ ) in the magenta world wor(1 µ 1 ρ 0 ν ).First, the observed event obs xaby (0 µ 1 ρ 1 ν ) = 0 x ? a 1 b 1 y in the yellow world wor(0 µ 1 ρ 1 ν ) must belong to the list of possible events prescribed by P xaby ; a quick inspection leads one to recognize that 19 The two variable correlation is defined as ab|x x x y = 2 i,j=1 (−1) i+j P ab|xy (i a j b |x x x y ). 20Clearly, the values of λ µ and λ ν that support these worlds must be unique.Less obvious is the possibility for these worlds to share a λ ρ value.Albeit if they do, the event 0 x 0 a 1 b 1 y becomes possible, contradicting P the only possibility is obs a (0 µ 1 ρ 1 ν ) = f a (0 x 1 ρ ) = 1 a .An analogous argument in the magenta world wor(1 µ 1 ρ 0 ν ) proves that obs b (1 µ 1 ρ 0 ν ) = f b (0 y 1 ρ ) = 0 b .Therefore, the observed event in the orange world wor(0 µ 1 ρ 0 ν ) must be, and therefore P xaby (0 x 1 a 0 b 0 y ) > 0 which contradicts P xaby .Therefore, P xaby is possibilistically 21 incompatible with G 9 .

The Triangle Structure
Consider the causal structure G 11 depicted in Figure 11 known as the Triangle structure.The Triangle has been studied extensively in recent decades [10,12,23,24,30,37,55,58,60].The following family of distributions are possibilistically incompatible with G 11 22 , P Proof.Proof by contradiction: assume that a functional causal model abc and use the possible worlds framework.For each distinct event in P (32) abc , consider a world in which it happens definitely.Explicitly define, corresponding to the exterior worlds in Figure 12.Consider magenta world wor(0 µ 1 ρ 1 ν ) with partially specified observation obs abc (0 µ 1 ρ 1 ν ) =? a ?b 1 c .Recalling P abc , whenever 21 The proof holds if the probabilities of the events in P xaby are any positive value. 22The Inflation Technique first proved the incompatibility between P  c takes value 1 c , both a and b take the value 0; i.e. 0 a 0 b .Therefore, it must be that the observed event in the magenta world wor(0 µ 1 ρ 1 ν ) is obs abc (0 µ 1 ρ 1 ν ) = 0 a 0 b 1 c .An analogous argument holds for other worlds, However, the conclusions drawn by Equation 36predict the observed event the in central, green world wor(0 µ 2 ρ 1 ν ) must be, and therefore P abc (0 a 0 b 0 c ) > 0 which contradicts P abc .Therefore, P abc is possibilistically incompatible with G 11 .

An Evans Causal Structure
Consider the causal structure in Figure 13, denoted G 13 .This causal structure was first mentioned by Evans [21], along with two others, as one for which no existing techniques were able to prove whether or not it was saturated; that is, whether or not all distributions were compatible with it.Here it is shown that there are indeed distributions which are possibilistically incompatible with G 13 using the framework of possible worlds diagrams.As such, this framework currently stands as the most powerful method for deciding possibilistic compatibility.
Consider the family of distributions with three possible events: Regardless of the values for p 1 , p 2 , p 3 (and abcd is incompatible with G 13 . Proof.Proof by contradiction.First assume that a deterministic model F V = {f a , f b , f c , f d } for P (38) abcd exists and adopt the possible worlds framework.Let wor(i  38.The worlds are colored: wor(0 µ 0 ν 0 ρ ) magenta, wor(1 µ 1 ν 1 ρ ) orange, wor(2 µ 2 ν 2 ρ ) yellow, wor(1 µ 0 ν 2 ρ ) violet, and wor(1 µ0 2 ν 2 ρ ) green.{1, 2, 3} index the possible worlds which support the events observed in P abcd , Only two additional possible worlds are necessary for achieving a contradiction.Consulting Figure 14 for details, these possible worlds are wor(1 µ 0 ν 2 ρ ) colored violet and wor(1 µ 2 ν 2 ρ ) colored green.Notice that the determined value for a must be the same in both worlds as it is independent of λ ν : There are only two possible values for x a in any world, namely x a = 0 a or x a = 1 a as given by P abcd .First suppose that x a = 0 a .Then in the violet world wor(1 µ 0 ν 2 ρ ), the value of b, to be obs b (1 µ 0 ν 2 ρ ) = f b (0 a 0 ν ) = 0 b is completely constrained by consistency with the magenta world wor(0 µ 0 ν 0 ρ ).Therefore, obs ab (1 µ 0 ν 2 ρ ) = 0 a 0 b .By analogous logic, in the violet world the value of c is constrained to be obs c (1 µ 0 ν 2 ρ ) = f c (0 b 1 µ ) = 0 c by the orange world wor(1 µ 1 ν 1 ρ ).Therefore, obs abc (1 µ 0 ν 2 ρ ) = 0 a 0 b 0 c , which is a contradiction because 0 a 0 b 0 c is an impossible event in P (38) abcd .Therefore, it must be that x a = 1 a .An unavoidable contradiction follows from attempting to populate the green world wor(1 µ 2 ν 2 ρ ) in Figure 14 which is an impossible event in P (38) abcd .This contradiction implies that no functional model F V = {f a , f b , f c , f d } exists and therefore P (38) abcd is possibilistically incompatible with G 13 .
To reiterate, there are currently no other methods known [21] which are capable of proving the incompatibility of any distribution with G 13 23 .Therefore, the possible worlds framework can be seen as the state-of-the-art technique for determining possibilistic causation.

Necessity and Sufficiency
Throughout this section, we explored a number of proofs of possibilistic incompatibility using the possible worlds framework.Moreover, the above examples communicate a systematic algorithm for deciding possibilistic compatibility.Given a distribution P V with support σ(P V ) ⊂ Ω V , and a causal structure G = (V ∪ L, E), the following algorithm sketch determines if P V is possibilistically compatible with G.
denote the number of possible events provided by P V .
2. For each 1 ≤ i ≤ W , create a possible world wor λ 4. If an impossible event x V ∈ σ(P V ) is produced by any "off-diagonal" world wor(. . .i . . .j . ..)where i = j, or if a cross-world consistency constraint is broken, back-track.
Upon completing the search, there are two possibilities.The first possibility is that the algorithm returns a completed, consistent, possible worlds diagram D. Then by Lemma 1, P V is possibilistically compatible with G.The second possibility is that an unavoidable contradiction arises, and P V is not possibilistically compatible with G. 24

A Complete Probabilistic Solution
In Section 4, we demonstrated that the possible worlds framework was capable of providing a complete possibilistic solution to the causal compatibility problem.If however, a given distribution P V happens to satisfy a causal hypothesis on a possibilistic level, can the possible worlds framework be used to determine if P V satisfies the causal hypothesis on a probabilistic level as well?In this section, we answer this question affirmatively.In particular, we provide a hierarchy of feasibility tests for probabilistic compatibility which converges exactly.In addition, we illustrate that a possible worlds diagram is the natural data structure for algorithmically implementing this converging hierarchy.

Symmetry and Superfluity
This aforementioned hierarchy of tests, to be explained in Section 5.3, relies on the enumeration of all probability distributions P V which admit uniform functional causal models (G, F V , P L ) for fixed cardinalities k V∪L = {k q = |Ω q | | q ∈ V ∪ L}.A functional causal model is uniform if the probability distributions P ∈ P L over the latent variables are uniform distributions; P : Ω → k −1 .Section 5.2 discusses why uniform functional causal models are worth considering, whereas in this section, we discuss how to efficiently enumerate all probability distributions P V that are uniformly generated from fixed cardinalities k V∪L .
One method for generating all such distributions is to perform a brute force enumeration of all deterministic strategies F V for fixed cardinalities k V∪L .Depending on the details of the causal structure, the number of deterministic functions of this form is poly-exponential in the cardinalities k V∪L .This method is inefficient because is fails to consider that many distinct deterministic strategies produce the exact same distribution P V .There are two optimizations that can be made to avoid regenerations of the same distribution P V while enumerating all deterministic strategies F V .These optimizations are best motivated by an example using the possible worlds framework.
Consider the causal structure G 15a in Figure 15a with visible variables V = {a, b, c} and latent variables L = {µ, ν}.Furthermore, for concreteness, suppose that The possible worlds diagram D for G 15a generated by Equation 42 is depicted in Figure 15b.If the latent valuations are distributed uniformly, the probability distribution associated with Figure 15b (as given by Equation 17) is equal to, The first optimization comes from noticing that Equation 42 specifies how c would respond if provided with the valuation 1 a 0 b 1 ν of its parents, namely f c (1 a 0 b 1 ν ) = 3 c .Nonetheless, this hypothetical scenario is excluded from Figure 15b (crossed out in the figure) because the functional model in Equation 42 never produces an opportunity for a to be different from b. Consequently, the functional dependences in Equation 42contain superfluous information irrelevant to the observed probability distribution in Equation 43.Therefore, a brute force enumeration of deterministic strategies would regenerate Equation 43 several times, once for each assignment of c's behavior in these superfluous scenarios.It is possible to avoid these regenerations by using an unpopulated possible worlds diagram D as a data structure and performing a brute force enumeration of all consistent valuations of D.
The second optimization comes from noticing that Equation 43 contains many symmetries.Notably, independently permuting the latent valuations, π µ : 0 µ ↔ 1 µ or π ν : 0 ν ↔ 1 ν , leaves the observed distribution in Equation 43invariant, but maps the functional dependences F V of Equation 42 to different functional dependences F πµ V and F πν V .These symmetries are reflected as permutations of the worlds as depicted in Figures 15c, and 15d.
Analogously, it is possible to avoid these regenerations by first pre-computing the induced action on D, and thus an induced action on F V , under the permutation group S L = ∈L perm(Ω ).Then, using the permutation group S L , one only needs to generate a representative from the equivalence classes of possible worlds diagrams D under S L .
Importantly, the optimizations illuminated above, namely ignoring superfluous specifications and exploiting symmetries, are universal25 ; they can be applied for any causal structure.Additionally, the possible worlds framework intuitively excludes superfluous cases and directly embodies the observational symmetries, making a possible worlds diagram the ideal data structure for performing a search over observed distributions.

The Uniformity of Latent Distributions
The purpose of this section is motivate why it is always possible to approximate any functional causal model (G, F V , P L ) with another functional causal model (G, FV , PL ) which has latent events λ L ∈ ΩL uniformly distributed.Unsurprisingly, an accurate approximation of this form will require an increase in the cardinality | ΩL | > |Ω L | of the latent variables.Definition 9 (Rational Distributions).A discrete probability distribution P over Ω is rational if every probability assigned to events in Ω by P is rational, Definition 10 (Distance Metric for Distributions).Given two probability distributions P, P over the same sample space Ω, the distance ∆(P, P) between P and P is defined as, ∆(P, P) =   In particular, the distance between P V and PV is bounded by, where C = max {c | ∈ L}, K = min {k | ∈ L}, and L = |L| is the number of latent variables.
Proof.The proof relies on Theorem 2 and can be found in Appendix C.

A Converging Hierarchy of Compatibility Tests
In Section 5.1, we discussed how to take advantage of the symmetries of a possible worlds diagram and the superfluities within a set of functional parameters F V in order to optimally search over functional models.In Section 5.2, we discussed how to approximate any functional causal model (G, F V , P L ) using one with uniform latent probability distributions.Here we combine these insights into a hierarchy of probabilistic compatibility tests for the causal compatibility problem.(G), as the set of all distributions PV ∈ M V (G) which admit of a uniform functional model (G, F V , P L ) with cardinalities k L .
Recall that Section 5.1 demonstrates a method, using the possible worlds framework, for efficient generation of the entirety of U where (G), within a distance ε given by Equation 54.
Lemma 4 forms the basis of the following compatibility test, Theorem 5 (The Causal Compatibility Test of Order K).For a probability distribution P V and a causal structure G, the causal compatibility test of order K = min {k | ∈ L} is defined as the following question: Does there exist a uniformly induced distribution PV ∈ U As K → ∞, the distance tends to zero ε(K) → 0 and the sensitivity of the test increases.If P V ∈ M V (G), then P V will fail the test for finite K.If P V ∈ M V (G), then P V will pass the test for all K.Moreover, for fixed K, the test can readily return the functional causal model behind the best approximation PV .
First notice that Theorem 5 achieves the same rate of convergence as [37].Unlike the result of [37], Theorem 5 returns a functional model which approximates P V .It is interesting to remark that the distance bound ε ∈ O(LC/K) in Equation 55depends on C = max {c | ∈ L} where c is the minimum upper bound placed on the cardinalities of the latent variable by Theorem 9.As conjectured in Appendix B, it is likely that there are tighter bounds that can be placed on these cardinalities for certain causal structures.Therefore, further research into lowering these bounds will improve the performance of Theorem 5.

Conclusion
In conclusion, this paper examined the abstract problem of causal compatibility for causal structures with latent variables.Section 3 introduced the framework of possible worlds in an effort to provide solutions to the causal compatibility problem.Central to this framework is the notion of a possible worlds diagram, which can be viewed as a hybrid between a causal structure and the functional parameters of a causal model.It does not however, convey any information about the probability distributions over the latent variables.
In Section 4, we utilized the possible worlds framework to prove possibilistic incompatibility of a number of examples.In addition, we demonstrated the utility of our approach by resolving an open problem associated with one of Evans' [21] causal structures.Particularly, we have shown the causal structure in Figure 13 is incompatible with the distribution in Equation 38.Section 4 concluded with an algorithm for completely solving the possibilistic causal compatibility problem.
In Section 5, we discussed how to efficiently search through the observational equivalence classes of functional parameters using a possible worlds diagram as a data structure.Afterwards, we derived bounds on the distance between compatible distributions and uniformly induced ones.By combining these results, we provide a hierarchy of necessary tests for probabilistic causal compatibility which converge in the limit.

A Simplifying Causal Structures
A.

Observational Equivalence
From an experimental perspective, a causal model (G, P) has the ability to predict the effects of interventions; by manually tinkering with the configuration of a system, one can learn more about the underlying mechanisms than from observations alone [43].When interventions become impossible, because experimentation is expensive or unethical for example, it becomes possible for distinct causal structures to admit the same set of compatible correlations.An important topic in the study of causal inference is the identification of observationally equivalent causal structures.Two causal structures G and G are observationally equivalent or simply equivalent if they share the same set of compatible models M V (G) = M V (G ).For example, the direct cause causal structure in Figure 17a is observationally equivalent to the common cause causal structure in Figure 17b.Identifying observationally equivalent causal structures is of fundamental importance to the causal compatibility problem; if a distribution P V is known to satisfy the hypotheses of G, and M V (G) = M V (G ) then it will also satisfy the hypotheses of G .

A.2 Exo-Simplicial Causal Structures
In general, other than being a directed acyclic graph, there are no restrictions placed on a causal structure with latent variables.Nonetheless, [21] demonstrated a number of transformations on causal structures which leave M V (G) invariant.Two of these transformations are the subject of interest for this section.The first concerns itself with latent vertices that have parents while the second concerns itself with parent-less latent vertices that share children.Each will be taken in turn.
Definition 12 (See Defn.3.6 [21]).Given a causal structure G = (V ∪ L, E) with latent vertex ∈ L, the exogenized causal structure exo G ( ) is formed by taking E and (i) adding an edge p → c for every p ∈ pa G ( ) and c ∈ ch G ( ) if not already present, and (ii) deleting all edges of the form p → where p ∈ pa (a) A latent vertex with observable parents.Proof.See proof of Lem.3.7 from [21].
The concept of exogenization is best understood with an example.
Example 1.Consider the causal structure G 18a in Figure 18a.In G 18a , the latent variable has parents pa( ) = {v 1 , v 2 , v 3 } and children ch( ) = {v 4 , v 5 }.Since the sample space Ω is unknown, its cardinality could be arbitrarily large or infinite.As a result, it has an unbounded capacity to inform its children of the valuations of its parents, e.g.v 4 can have complete knowledge of v 1 through and therefore adding the edge v 1 → v 4 has no observational impact.Applying similar reasoning to all parents of , i.e. applying Lemma 6, one converts G 18a to the observationally equivalent, exogenized causal structure exo G 18a ( ) depicted in Figure 19.
Lemma 6 can be applied recursively to each latent variable ∈ L in order to transform any causal structure G into an observationally equivalent one wherein the latent variables have no parents (exogenous).Notice that the process of exogenization also works when latent vertices have latent parents, as is the case in Figure 18b.Also, when a latent vertex has no children, the process of exogenization disconnects from the rest of the causal structure, where it can be ignored with no observational impact due to Equation 7. The next observationally invariant transformation requires the exogenization procedure to have been applied first.In Figure 18d, 1 and 2 are exogenous latent variables where ch G 18d ( 2 ) ⊂ ch G 18d ( 1 ).Therefore, because the sample space Ω 1 is unspecified, it has the capacity to emulate any dependence that v 3 and/or v 2 might have on 2 .This idea is captured by Lemma 7.
An immediate corollary of Lemma 7 is that the latent variables { | ∈ L}, which are isomorphic to their children {ch( ) | ∈ L}, are isomorphic to the facets of a simplicial complex over the visible variables.Definition 13.An (abstract) simplicial complex, ∆, over a finite set V is a collection of non-empty subsets of V such that: 1. {v} ∈ ∆ for all v ∈ V; and The maximal subsets with respect to inclusion are called the facets of the simplicial complex.
In [21], this concept led to the invention of mDAGs (or marginal directed acyclic graphs), a hybrid between a directed acyclic graph and a simplicial complex.In this work, we refrain from adopting the formalism of mDAGs and instead continue to consider causal structures as entirely directed acyclic graphs.Despite this refrain, Lemmas 6, 7 demonstrate that for the purposes of the causal compatibility problem, the latent variables of a causal structure can be assumed to be exogenous and to have children forming the facets of a simplicial complex.Causal structures which adhere to this characterization will be referred to as exo-simplicial causal structures.Figure 20 depicts four exo-simplicial causal structures respectively equivalent to the causal structures in Figure 18.(a)

B Simplifying Causal Parameters
Recall that a causal model (G, P) consists of a causal structure G and causal parameters P. Appendix A simplified the causal compatibility problem by revealing that each causal structure G can be replaced with an observationally equivalent exosimplicial causal structure G such that M V (G) = M V (G ).The purpose of this section is to simplify the causal compatibility problem in three ways.Section B.1 demonstrates that the visible causal parameters P v|pa(v) | v ∈ V of a causal model can be assumed to be deterministic without observational impact.Section B.2 shows that if the observed distribution is finite (i.e.|Ω V | < ∞), one only needs to consider finite probability distributions for the latent variables.Moreover, explicit upper bounds on the cardinalities of the latent variables can be computed.

B.1 Determinism
Lemma 8.If P V ∈ M V (G) and G is exo-simplicial (see Appendix A), then without loss of generality, the causal parameters P v|pa G (v) over the observed variables can be assumed to be deterministic, and consequently, Proof.Since P V ∈ M V (G), by definition, there exists a joint distribution P V∪L (or density dP V∪L ) admitting marginal P V via Equation 7. Since the joint distribution satisfies Equation 6, it is possible to associate to each observed variable X v an independent random variable E ev and measurable function f v : Ω vpa G (v) × Ω lpa G (v) × Ω ev such that for all v ∈ V, Therefore, by promoting each e v to the status of a latent variable in G and adding an edge e v → v to E, each X v becomes a deterministic function of its parents.Finally, making use of the fact that G is exo-simplicial, every error variable e v has its children ch G (e v ) = {v} nested inside the children of at least one other pre-existing latent variable.Therefore, by applying Lemma 7, e v is eliminated and one recovers the original G.
Essentially, Lemma 8 indicates that any non-determinism due to local noise variables E ev can be emulated by the behavior of the latent variables L.

B.2 The Finite Bound for Latent Cardinalities
In [50], it was shown that if the visible variables have finite cardinality (i.e.k V = |Ω V | is finite), then for a particular class of causal structures known as causal networks, the cardinalities of the latent variables could be assumed to be finite as well.A causal network is a causal structure where all latent variables have no parents (are exogenous) and all visible variables either have no parents or no children [37].The purpose of this section is to generalize the results of [50] to the case of exo-simplicial causal structures.Although the proof techniques presented here are similar to that of [50], the best upper bounds placed on k L = |Ω L | depends more intimately on the form of G.It is also anticipated that the upper bounds presented here are suboptimal, much like [50].It is also worth noting that the results presented here hold independently of whether or not Lemma 8 is applied.Theorem 9. Let (G, P) be a causal model with (possibly infinite) cardinalities k L = {k | ∈ L} for the latent variables such that, produces the distribution P V .Then there exists a causal model (G, P ) reproducing P V with cardinalities k L = {k | ∈ L} where each k is a finite.
Proof.The following proof considers each latent variable ξ ∈ L independently and obtains a value for k in each case.Let L = L − {ξ} denote the set of latent variables

Figure 1 :
Figure 1: The difference between a directed cyclic graph and a directed acyclic graph.
An example causal structure G 3a .The possible worlds picture for G 3a .
Identifying consistency constraints among possible worlds for G 3a .
Populating a possible worlds diagram with the deterministic functions f a , f b , f c in Equation 14.

Figure 3 :
Figure 3: A causal structure G 3a and the creation of the possible worlds diagram when k µ = k ν = 2.

Figure 4 :
Figure 4: A vertex of a possible worlds diagram dissected.

Figure 6 :
Figure 6: The possible worlds diagram for G 5 (Figure 5) is incompatible with P (20) abc

Figure 9 :
Figure 9: The Bell causal structure has variables a, b 'measuring' hidden variable ρ with 'measurement settings' x, y determined independently of ρ.

Figure 11 :
Figure 11: The Triangle structure G 11 involving three visible variables V = {a, b, c} each sharing a pair of latent variables from L = {µ, ν, ρ}.

L
= {i | ∈ L}, thus defining the latent sample space Ω L .3. Attempt to complete the possible worlds diagram D initialized by the worlds wor λ A causal structure G 15a with three visible variables V = {a, b, c} and two latent variables L = {µ, ν}.A possible worlds diagram for G 15a .The crossed out vertex is excluded because it fails to satisfy the ancestral isomorphism property.The image of Figure15bunder the permutation 0 µ ↔ 1 µ .The image of Figure15bunder the permutation 0 ν ↔ 1 ν .

Definition 11 .
Given a causal structure G, and given cardinalities 26 k L = {k = |Ω | | ∈ L} for the latent variables, define the uniformly induced distributions, denoted as U (k L ) V ε is a function of K = min {k | ∈ L}, the number of latent variables L = |L|, and C = max {c | ∈ L} where c is the minimum upper bound placed on the cardinalities of the latent variable by Theorem 9. Proof.Since c L = {c | ∈ L} are minimum upper bounds placed on the cardinalities of the latent variables by Theorem 9, any P V ∈ M V (G) must admit a functional causal model with cardinalities for the latent variables at most c L .Then by Theorem 3, there exists a uniform causal model producing PV ∈ U (k L ) V

v 1 v 2 (
a) A direct cause from v 1 to v 2 .

v 1 v 2 (
b) A shared common cause between v 1 and v 2 .

Figure 17 :
Figure 17: The causal structures of (a) and (b) are observationally equivalent.

Figure 18 :
Figure 18: Examples of causal structures which are not exo-simplicial.

Figure 20 :
Figure 20: Examples of exo-simplicial causal structures which are observationally equivalent to their respective counterparts in Figure 18.

3
Figure 2: The causal structure G 2 in this figure encodes a causal hypothesis about the causal relationships between the visible variablesV = {v 1 , v 2 , v 3 , v 4 , v 5 } and the latent variables L = { 1 , 2 , 3 }; e.g.v 2 experiences a direct causal influence from each of its parents, both visible vpa G 2 (v 2 ) = {v 1 , v 4 } and latent lpa G 2 (v 2 ) = { 1 , 2 }.Throughout this paper, visible variables and edges connecting them are colored blue whereas all latent variables and all other edges are colored red.