## 1 Introduction

Causal relationships, unlike statistical dependences, support inferences about the effects of interventions and the truths of counterfactuals. While a randomised controlled experiment can be used to determine causal relationships, these may not be available for various reasons: they could be restrictively expensive, technologically infeasible, unethical (e.g., assessing the effect of smoking on lung cancer), or indeed physically impossible (e.g., for variables describing properties of distant astronomical bodies). Therefore, inferring causal relationships from uncontrolled statistical data is an important problem, with broad applicability across scientific disciplines. Over the past-twenty five years, there has been much progress in developing methods to solve this problem [1, 2, 3, 4, 5].

As has become standard practice, we formalize the notion of causal structure using directed acyclic graphs (DAGs) with random variables as nodes and arrows representing direct causal influence [1, 2]. A more refined description of causal dependences specifies not only what causes what, but also, for every variable, its functional dependence on its causal parents. We shall use the term *functional causal structure* to refer to the specification of the set of functions, which includes a specification of the DAG.

As is standard, the variables that are not observed are termed *latent*, and the DAG does not include any latent variables that act as causal mediaries, so that all the latent variables are parentless. We shall use the term *causal model* to describe the functional causal structure together with a specification, for each latent variable, of a probability distribution over its values. Each causal model associated to a given functional causal structure defines a possible joint probability distribution over the observed variables. We are interested in the set of possible joint distributions over the observed variables for a given functional causal structure, that is, those that can arise from *some* set of distributions on the latent variables.

We will say that two functional causal structures are *observationally equivalent* if they are characterized by the same set of distributions over the observed variables.^{1}Many causal inference algorithms, such as those of refs [1] and [2], only make use of conditional independence relations among the observed variables. If two causal structures are such that the same set of conditional independence relations are faithful to them, then they are said to be Markov equivalent.

Note that Markov equivalence can be decided purely on the basis of the DAG (i.e., the causal structure), while the notion of observational equivalence of interest here depends on the functional dependences (i.e., the *functional causal structure*).

In the case of just two observed variables, which is the one we consider here, the set of all causal structures are partitioned into just two Markov equivalence classes: those wherein the variables are causally connected, and those wherein they are not. As we show, however, the joint distribution over the observed variables supports many more inferences about the functional causal structure, thereby providing a more fine-grained classification than is provided by Markov equivalence.

In recent years, several methods have been suggested that make use not only of conditional independences, but also other properties of the joint statistical distribution between the observed variables [3, 4, 5, 6] (See also the works discussed in Section 6.2 and Section 6.3). These newer methods also have limitations in the sense that they impose restrictions on the number of latent variables allowed in the underlying causal model and also on the mechanisms by which these latent variables influence the observed ones.

In the present work, we restrict attention to the causal inference problem where there are just two observed variables, each of which is binary (that is, discrete with just two possible values). We allow any functional causal structure involving latent variables that are discrete (with a finite number of values), and we impose no restriction on the number of latent variables or the mechanisms by which these influence the observed ones.

We provide an inductive scheme for characterizing the observational equivalence classes of functional causal structures. This scheme has a few steps. First we show that, in each observational class, there is a functional causal structure wherein all of the latent variables are binary. Restricting ourselves to the latter sort of functional causal structure, we show that one can inductively build up any functional causal structure from a pair of others having fewer latent variables. Thus, starting with functional causal structures with no latent variables, we can recursively build up all functional causal structures, and therefore all observational equivalence classes of these, by applying our inductive scheme.

Using this scheme, we catalogue all observational equivalence classes generated by functional causal structures with four or fewer binary latent variables. We have evidence, but no proof yet, that our catalogue is complete in the sense that a functional causal structure with any number of binary latent variables – and hence, by the connection described above, any functional causal structure with discrete latent variables – belongs to one of the classes we have identified.

We also describe a procedure for deriving, for each class, the set of necessary and sufficient conditions on the joint distribution over observed variables for it to be possible to generate it from functional causal structures in this class. We call such a set of conditions a *feasibility test* for the class.

The procedure for deriving these is as follows. We start with a particular functional causal structure within the class, express the parameters in the joint probability distribution over the observed variables in terms of the parameters in the probability distributions over the latent variables, then eliminate the latter using techniques from algebraic geometry.

Finally, we consider applications to the problem of identifying causal parameters. For the parameters describing the probability distributions over the latent variables, we note that our technique allows one to find expressions for these in terms of the observational data for each observational equivalence classes that we have considered. For the parameters describing the functional relations, we note that the limits to what one can infer about these, which may be different for different points in the space of possible joint distributions over the observed variables, can be inferred from our feasibility tests.

## 2 Setting up the problem

Consider the causal model of Figure 1(a). From the DAG, it is clear that

The functional dependences are given by *additive noise model* (ANM) in refs [3, 4, 5, 6]. The values of

In Ref. [5], it was shown that one can distinguish between the causal model of Figure 1(a) and the causal models depicted in Figure 2(a) and Figure 2(c), except for special cases of the distributions over the noise variables, such as, for instance, when

To describe the correlations we adopt the following notational convention.

Let

This means

Note that if a latent variable were to take one of its values with probability 1, then it would be trivial and could be eliminated from the functional causal structure. We therefore consider only functional causal structures with nontrivial latent variables, that is, latent variables that have some statistical variation in their value, so that the probability of any value is bounded away from 0 and 1. In the present example, therefore,

For a general causal model we have

We can rewrite

So, if we fix the value of *fan*. Figure 2(b) and Figure 2(d) depict the set of joint distributions for the ANM where

Given some joint distribution,

This problem was solved for the example of Figure 1 in Ref. [5] using the following technique. First, it was noted that the DAG implies that

which can be rewritten as:

This equation, together with the *open-interval constraints*,

defines the fan in Figure 1(b). Using similar techniques, one can show that Figure 2(b) and Figure 2(d) are defined by equation

together with the open-interval constraint.

The question is: how can one find feasibility tests for generic causal models? In particular, how does one treat models where the noise is not additive? Consider, for instance, the causal model that has the same DAG as in Figure 1(a), but where the noise is multiplicative, that is,

It is also unclear how one can characterize the possibilities for the joint distribution when the causal model involves an arbitrary number of latent variables.

We will show that these questions can be answered using powerful tools from algebraic geometry, which we describe in the next section.

## 3 Deriving the feasibility tests

We begin with an introduction to some of the main concepts of algebraic geometry following the presentation given in [7]. For a more detailed discussion, see A.

Denote the set of all polynomials in variables

When dealing with polynomials, we are mainly interested in the solution set of systems of polynomial equations. This leads us to the main geometrical objects studied in algebraic geometry, algebraic varieties and semi-algebraic sets.

An *algebraic variety*^{2}*basic semi-algebraic set* is defined to be the solution set of a system of polynomial equalities and inequalities, that is, ^{3} and where *semi-algebraic set* is formed by taking finite combinations of unions, intersections, or complements of basic semi-algebraic sets. For instance, the fan in Figure 1(b) is the semi-algebraic set that results from the intersection of the algebraic variety defined by the single polynomial equation

More generally, for *any* causal model, the set of possible joint distributions that can be generated by it are represented by a semi-algebraic set. It follows that two causal models are observationally equivalent if and only if they generate the same semi-algebraic set.

We now define *ideals*, the main algebraic object studied in algebraic geometry. A subset *ideal* if it satisfies: (1)

A natural example of an ideal is the ideal generated by a finite number of polynomials, defined as follows. Let *ideal generated by*

The polynomials *basis* of the ideal.

Studying the relations between certain ideals and varieties forms one of the main areas of study in algebraic geometry. One can even define the algebraic variety

Interestingly, it can also be shown that if *varieties are determined by ideals*.

We can now use the language of algebraic geometry to restate the question asked at the end of the last section. Let

where the *some* values of the *implicitization*. The second step is to determine which points in this algebraic variety can be extended to a solution of the equalities and inequalities of the original parametric characterization.

For example, consider the algebraic variety that is defined parametrically by the polynomial equations

We would like to characterize the semi-algebraic set that this variety defines on the observed variables

The problem can be solved by employing a specific choice of basis for the ideal generated by the system of polynomial equations that define the variety eq. (1). The basis that achieves this feat is known as the *Groebner basis*.

Groebner bases simplify many calculations in algebraic geometry and they have many interesting properties [7]. There are efficient algorithms for calculating Groebner bases and many software packages that one can use to implement them.

We discovered in this section that the fan of Figure 1(b) is in fact the intersection of the algebraic variety defined by the ideal

^{4}of this ideal is found to be

Solutions to

which define our algebraic variety. Looking more closely at the Groebner basis we note that the variables

which, using the normalization condition, then gives us

On demanding *elimination theorem*, which provides us with a way of using Groebner bases to systematically eliminate certain variables from a system of polynomial equations and, thus, to solve the implicitization problem.

The general procedure for finding the semi-algebraic set is as follows. First, given the system of polynomial equations defining the implicitization problem, as in eq. (1), form the ideal generated by these polynomials and compute^{5} its Groebner basis. The elements of this basis that do not contain the variables

The equality constraints from the first step and the inequality constraints from the second step together characterize the semi-algebraic set on

These inequality constraints manifest themselves in different ways. We present an example of one such manifestation below and leave the remaining examples to B.

Consider the causal model of Figure 3(a). Defining

We begin by providing an intuitive account of the semi-algebraic set describing such joint distributions. Note first that

implying that it is the convex combination, with weight

It follows that the semi-algebraic set defined by

Reading off the expressions for

To implement the first step of the general procedure outlined above, we derive the Groebner basis for this ideal ^{6}:

Now

which, using the normalisation condition, results in

To implement the second step of our procedure, we begin by enforcing

None of the remaining constraints *only* nontrivial constraint. Together with the open-interval constraints

## 4 Characterizing the observational equivalence classes

In this section, we will provide a scheme for inductively characterizing all observational equivalence classes. As noted in the introduction, we consider only causal models where there is a pair of binary observed variables, which we denote by

### 4.1 Sufficiency of considering purely common-cause models

A causal model having no directed causal influences between the observed variables will be termed *purely common-cause*.

Every causal model wherein there is a directed causal influence between

The proof is as follows. Suppose that there is a directed causal influence

An explicit example serves to illustrate this equivalence. The causal model depicted in Figure 4(a), with functional dependences

as

### 4.2 Sufficiency of considering models with binary latents

We call a causal model where all the latent variables are binary a *causal model with binary latents*. If there are .

Consider the family of causal models where the latent variables are discrete and finite, but not necessarily binary. Every such model is observationally equivalent to one with binary latents. Equivalently, there is a causal model with binary latents in each observational equivalence class.

The proof is provided in C, but we now present a simple example which illustrates the main idea of the proof.

Consider the causal model of Figure 5(a), where

One can see that the distributions over

The trick to simulating this model using a

If we take

The key ingredient of the above example was that we were able find a causal model which could – by appropriately varying over the distribution of its latent variables – generate any distribution over a given face of the tetrahedron, and hence any distribution on a trit. In the case of an

Theorem 4.2.1 implies that for the project of determining the observational equivalence classes, it suffices

to consider models with binary latents. and so we restrict our attention to these henceforth.

### 4.3 Inductive scheme

Next, we define a scheme for composing pairs of *all* possible pairs of

Denote the

We assume that the first causal model is defined by parameters

This construction has been chosen such that

With these definitions, our composition result can be summarized as follows.

Consider the map that takes a pair of

The functional dependences of eq. (5) can equivalently be expressed as polynomials in

It now suffices to note that as one varies over all possible joint values for the variables in the set *pairs* of

We can therefore generate all causal models with binary latents by this inductive rule starting from the

### 4.4 Catalogue of observational equivalence classes

Recall that two causal models are observationally equivalent if they define the same semi-algebraic set. Thus, to characterize the observational equivalence classes, we proceed as follows. For each new causal model that we generate by the inductive scheme, we determine the corresponding semi-algebraic set. Every time one obtains a variety that has not appeared previously, one adds it to the catalogue of observational equivalence classes.

Note that if a causal model has been obtained from two simpler models via our composition scheme, then the semi-algebraic set associated to it necessarily includes as subsets both of the semi-algebraic sets of the simpler models (note that this semi-algebraic set is generally *not* the convex hull of the semi-algebraic sets of the two simpler models).

It follows that if the semi-algebraic set of a given causal model is found to be the entire tetrahedron, then composing this model with any other will also yield the tetrahedron. In this case, there are no new observational equivalence classes to be found among the descendants of this causal model in the inductive scheme.

In particular, if it were to occur that at some level of the inductive scheme, every newly generated causal model could be shown either to reduce to a previously generated causal model or to yield a semi-algebraic set that is the entire tetrahedron, then one could conclude that one’s catalogue of the observational equivalence classes of causal models was complete in the sense that any

We have used our inductive scheme to construct all observational equivalence classes generated by causal models with four or fewer binary latent variables. We have also considered a large number of causal models with five binary latent variables and found no new observational equivalence classes. This suggests that our catalogue may already be complete, although we do not have a proof of this. Above, we noted circumstances in which our inductive scheme would terminate, which provides one strategy for attempting to settle the question.

Even in the absence of a proof of completeness, the inductive scheme presented here for classifying observational equivalence classes may be of independent interest to researchers in the field.

The observational equivalence classes of causal models that we have obtained (which cover all causal models with four or fewer binary latent variables) are presented in the table covering the next three pages. For each class, we depict the semi-algebraic set that defines the class, the feasibility test for the class, and a representative causal model from the class. Note that the open-interval constraints

The task of describing the catalogue is simplified by the fact that many of the observational equivalence classes are related to one another by simple symmetries. We therefore organize the classes into orbits, where an orbit is a set of classes whose elements are related to one another by a set of symmetry transformations. For one of the classes in the orbit (which we term the ‘fiducial’ class), we provide a full description, and below this description, we specify the set of symmetry transformations that must be applied to it to obtain the other elements of the orbit. Formally, this is a set of representatives of the *right cosets* of the subgroup of symmetries of the semi-algebraic set in the full symmetry group of the tetrahedron.

We express these representatives as compositions of the following set of symmetry transformations, which we define below:

Finally, a given observational equivalence class will be distinguished by a label of the form

Class | Semi-algebraic set | Test for feasibility | Minimal causal model |
---|---|---|---|

unless stated otherwise | |||

Transformations: | |||

(3, 2, f)_{Id} | |||

(4,2,a)_{Id} | p_{00}p_{11} > p_{01}p_{10}p_{11}p_{10} > p_{00}p_{01} | ||

(4, 2, b)_{Id} | p_{00}p_{11} > p_{01}p_{10}p_{11}p_{10} > p_{00}p_{01}p_{01}p_{00} > p_{00}p_{10} | ||

for the particular

The first few steps of our iterative procedure for the construction of causal models proceed as follows.

The semi-algebraic sets associated to the four

One finds that by composing these with one another into ; these correspond to the classes

Two of these correspond to models with a single latent bit acting as a common cause and their semi-algebraic sets are the , corresponding to the classes

Next, one constructs all of the

This set also includes the models of Figure 1(a) and Figure 2(a) where one of the latent bits acts as a common cause and whose semi-algebraic sets are the fans of Figure 1(b) and Figure 2(b), which touch the

They correspond to the pair of classes

There is also a second type of

which yield the semi-algebraic sets corresponding to the four faces of the tetrahedron. These are the four classes

When both of the latent variables act as common causes, one obtains semi-algebraic sets that are subsets of a face of the tetrahedron and which have the appearance of the StarFleet insignia from Star Trek, of which there are twelve in total. These are the classes labelled

The construction of

## 5 Identification of parameters in the causal model

Our results also have applications for the identification problem, that is, the problem of determining which parameters in a causal model can be identified or bounded using observational data.

Consider the problem of identifying the probability distributions over the latent variables (our *as an intermediate step* on the way to deriving our feasibility tests. Equation (3) is an example of such an identification formula.

The other sort of parameter of a causal model that one may wish to identify is the nature of the functional dependences (assuming the model is indeed functional). For the sorts of models we consider, this problem is also solved by our results.

Consider the problem where the causal structure is given, but where there is uncertainty over the nature of the functional dependences thereon. For instance, suppose that it is known that the functional causal structure is *either* the minimal structure associated to the class ^{7}, it is clear that one can settle this decision problem on the basis of the observational data.

As another, more nuanced example, suppose that it is known that the functional causal structure is the minimal structure associated to one of the three classes

More generally, one might know *only* the causal structure. For instance, the set of possible functional causal structures might be the minimal ones in each of the classes in the set

## 6 Discussion

### 6.1 Future directions

The restriction to pairs of binary observed variables is a limitation of our analysis. In future work, we hope to extend our approach to cases where the observed variables have an arbitrary number of values and where the number of observed variables is also arbitrary. While the tools from algebraic geometry employed in this paper provide a procedure for deriving feasibility tests for such functional causal structures in principle, in practice it is unlikely that such procedures will be scalable. Indeed, calculating Groebner bases is an

It also remains an open problem to decide, for any given functional causal structure, which observational equivalence class it belongs to. That is, even if our catalogue of classes is complete, it merely establishes that every functional causal structure falls into one of these classes, but it does not provide a means of deciding, for a given functional causal structure, having an arbitrary number of latent variables and functional dependences, which class it is a member of. Of course, if one supplements a given functional causal structure with distributions over the latent variables, then one obtains a joint distribution over the observed variables and this can be subjected to the feasibility tests for different observational equivalence classes. It is likely, however, that there are better ways of solving the classification problem, for instance, by determining how the functional dependences can be simplified. Solving this classification problem would allow one to find common features of all of the functional causal structures in a given class, for instance, features of the topology of the causal structure.

We have here made the idealization that the uncontrolled statistical data is given as a joint distribution whereas in practice it is a finite sample from this distribution. To contend with this idealization, one should in practice evaluate causal models by considering how well the finite statistical data can be fit to them.

### 6.2 Relevance to quantum foundations

One of the motivations of the current work was the prospect of new insights into the interplay between causal structure and observed correlations in quantum theory. In particular, for a pair of quantum systems – each subjected to one of a set of possible measurements – a *Bell inequality* [10, 11, 12] is a constraint on the joint probability distribution over the outcomes of each possible choice of the local measurements (that is, for every combination of local measurement *settings*). It has recently been noted [13, 14] that one can understand the assumptions required to derive a Bell inequality as the standard assumptions for causal inference together with a particular hypothesis about the underlying causal structure, namely, that each local outcome depends causally on the corresponding local setting and on a latent common cause between the two systems. This causal structure is illustrated in Figure 6, where

The problem considered in the current work is different from that of deriving the complete set of Bell inequalities in a couple of ways: (i) The observational input to our causal inference problem is different; there are no setting variables in our problem – that is to say, any variable distinct from the observed *set* of such distributions, one for each choice of the setting variables.

(ii) The hypotheses whose feasibility we are testing are different; while the set of all Bell inequalities provides a test of the feasibility of the causal structure illustrated in Figure 6, we here seek to assess the feasibility of a causal structure for a given choice of cardinalities for the latent variables appearing therein (e.g., whether a given latent variable consists of a single bit, two bits, etcetera) and for a given choice of the precise form of the functional dependence of the observed variables on the latent variables.

Consider the Bell scenario of Figure 6 where

Note that if some observed correlations violate an inequality derived in this fashion, it only establishes the infeasibility of a given classical functional causal structure. Violation of Bell inequalities, on the other hand, rule out the feasibility of the causal structure, regardless of the cardinality of the latent variables and the nature of the functional dependences. In this sense, deriving Bell inequalities is more challenging than deriving feasibility tests for functional causal structures. However, in another sense, deriving Bell inequalities is more straightforward because the semi-algebraic set defined by eq. (6) is a polytope, whereas for a general functional causal structure this is not the case. The mathematical tools that have been used to derive Bell-type inequalities – which include semi-definite [15] and linear programming [16] as well as Fourier-Motzkin elimination [17, 18] – are therefore quite different from those used here.

Bell inequalities are significant to the foundations of quantum theory because they are found to be violated in experiments on pairs of separated quantum systems, implying that the predictions of quantum theory cannot be explained by a classical causal model with the causal structure that one expects to hold for the experiment (that of Figure 6) without fine-tuning [13].

Researchers in the field of quantum foundations have now begun to apply insights obtained from the study of Bell inequalities to the problem of deriving constraints on observed correlations in more general causal scenarios [14, 19, 20, 21, 22, 23], and the current work constitutes another contribution in this direction.

More importantly, there are now a few proposals for how to generalize the standard notion of a causal model to the quantum realm. Reference [24], for instance, proposes a definition of a quantum causal model in terms of a noncommutative generalization of conditional probability, while refs. [19, 25, 26] follow a more operational approach.

With a notion of quantum causal model in hand, one can explore the problem of inferring facts about the quantum causal model from observed correlations. This is the problem of *quantum causal inference*.

In the case of Bell-type experiments, for instance, one expects a quantum causal model with the natural causal structure (that of Figure 6) to be feasible only if the observed correlations satisfy the so-called Cirel’son bound, which is a generalization of a Bell inequality [27]. A simple case of quantum causal inference that has been investigated recently is the problem of distinguishing a cause-effect relation from a common-cause relation.

Here, it has been shown that the quantum correlations can distinguish the two cases even in uncontrolled experiments, implying a quantum advantage for causal inference [28].

In quantum causal models, variables are replaced by systems, each represented by a Hilbert space, and one makes a distinction between observed systems, upon which a measurement is made, and latent systems. Sets of systems are described by joint quantum states (as opposed to the joint probability distributions that describe sets of variables), and the functional dependences are specified by unitary maps. A natural analogue of the classical causal inference problem is to make inferences about the causal structure and the parameters of the causal model given a *joint quantum state* on the observed systems. The natural analogue of the functional causal structures considered in this article are quantum causal structures together with a specification of the dimensions of the latent systems and the unitaries that describe the functional dependences. To derive a feasibility test for a functional causal structure, one must eliminate the real-valued parameters that specify the quantum state of the latent systems. For example, if a given latent system is 2-dimensional (the quantum analogue of a binary latent variable), there are *three* real-valued parameters needed to specify the state completely (as opposed to the one real parameter needed to completely specify a distribution over a classical bit). The expectation values of the three Pauli operators, for instance, suffice to do so. Nonetheless, one can still take advantage of the techniques from algebraic geometry employed in this work to eliminate these parameters and determine constraints on the quantum state of the observed systems. In this way, we ought to be able to derive feasability tests for functional causal structures in the quantum sphere.

### 6.3 Related work

The extent to which the mathematical tools associated to quantifier elimination are well-suited to problems of causal inference has been previously emphasized by Geiger and Meek [29]. Many authors have noted, in particular, the applicability of quantifier elimination to the problem of deriving tests for the feasibility of a causal structure when the cardinality of the latent variables is known. Reference [29], for instance, used Cylindrical Algebraic Decomposition to derive equality and inequality constraints for a particular causal model. However the computational complexity of such brute-force quantifier elimination (doubly exponential in the number of parameters) means that its applications are limited to very simple examples.

Many previous works have appealed to implicitization procedures using Groebner bases to obtain equality constraints for causal models.

Geiger and Meek [30], Garcia, Stillman and Sturmfels [31], and Garcia [32] have used implicitization to obtain the smallest algebraic variety that contains the semi-algebraic set of joint distributions over observed variables for various causal structures with known cardinalities of latent variables. This yields polynomial equalities on the joint distribution whose satisfaction are necessary conditions for compatibility with the causal structure. Kang and Tian [33] have also applied implicitization techniques to the problem of identifying polynomial equality constraints on observational and interventional distributions (using the framework supplied by refs. [34, 35]).

Our work goes beyond these treatments insofar as it uses implicitization as one step in an algorithm that finds the semi-algebraic set itself rather than the smallest algebraic variety containing it. The second step is to use the extension theorem (described in A) to find inequality constraints on the joint probability distribution over observed variables from knowledge of the Groebner basis. To illustrate the difference, consider the observational equivalence class labelled

One novel feature of our approach which distinguishes it from previous uses of implicitization is that we focus on deriving feasibility constraints for a causal structure with specific functional dependences. In previous approaches, the set of variables that needed to be eliminated included *both* the parameters describing the probability distributions for the root variables and the parameters describing the conditional probability distributions for each non-root variable. In our approach, the second sort of parameter is fixed and not in need of elimination. The restriction to binary variables ensures that the number of distinct possible functional dependences is relatively modest.

Finally, the use of Groebner bases in identifying or bounding parameters in a causal model has also been highlighted in previous work such as Garcia-Puente et al. [36].

After the completion of this work, we became aware of related independent works by Chaves [37] and Rossett et al. [38] which also derive nonlinear inequalities for determining the feasibility of certain causal structures. These authors consider structures which, like Bell scenarios, have multiple pairs of observed variables that are related as cause and effect (understood as setting-outcome pairs) but which, unlike Bell scenarios, can have more than one latent common cause acting on the outcome variables. Chaves simplifies the quantifier elimination problem that must be solved using a round of Fourier-Motzkin elimination, while Rossett et al. provide an inductive approach for deriving new inequalities from given inequalities for subgraphs of the causal network under consideration. Combining our approach with these other methods constitutes an interesting direction for future work.

This research began while CML was completing the Perimeter Scholars International Masters program at Perimeter Institute. Research at Perimeter Institute is supported by the Government of Canada through Industry Canada and by the Province of Ontario through the Ministry of Research and Innovation.

## References

- 2.↑
Sprites P, Glymour C, Scheines R. Causation, prediction and search, 2nd ed. Cambridge: The MIT press; 2000.

- 3.↑
Janzing D, Mooij J, Peters J, Scholkopf B. Identifying counfounders using additive noise models. Proceedings of the 25th Conference on Uncertainty in Artificia Intelligence.2009.

- 4.↑
Hoyer P, Janzing D, Mooij J, Peters J, Scholkopf B. Nonlinear causal discovery with additive noise models. Advances in Neural Information Processing Systems 21. New York: Curran Associates, Inc.; 2009:689–696.

- 5.↑
Peters J, Janzing D, Scholkopf B. Causal inference on discrete data using additive noise models. IEEE TPAMI. 2011;33(12):2436–2450.

- 6.↑
Peters J, Mooij JM, Janzing D, Scholkopf B. Causal discovery with continuous additive noise models. J Mach Learn Res. 2014;15:2009–2053.

- 7.↑
Cox D, Little J, O’Shea Ideals D. Varieties and algorithms: an introduction to computational algebraic geometry and commutative algebra. New York: Springer Verlag; 2007.

- 10.↑
Bell JS. Speakable and unspeakable in quantum mechanics. Cambridge: Cambridge University Press; 1964.

- 11.↑
Clauser JF, Horne MA, Shimony A, Holt RA. Proposed experiment to test local hidden-variable theories. Phys Rev Lett. 1969;23:880.

- 12.↑
Brunner N, Cavalcanti D, Pironio S, Scarani V, Wehner S. Bell nonlocality. Rev Mod Phys. 2014;86:419.

- 13.↑
Wood CJ, Spekkens RW. The lesson of causal discovery algorithms for quantum correlations: Causal explanations of Bell-inequality violations require fine-tuning. New J Phys. 2015;17:033002.

- 15.↑
Ver Steeg G, Galstyan A. A sequence of relaxations constraining hidden variable models. Proceedings of the 27th Conference on Uncertainty in Artificial Intelligence, UAI.2011.

- 16.↑
Bonet B. Instrumentality tests revisited. Proceedings of the 17th Conference on Uncertainty in Artificia Intelligence.2001.

- 17.↑
Budroni C, Cabello A. Bell inequalities from variable elimination methods. J Phys A Math Theor. 2012;45:385304.

- 18.↑
Fritz T, Chaves R. Entropic inequalities and marginal problems. IEEE Trans Inf Theory. 2013;59:803–817.

- 19.↑
Fritz T. Beyond Bell’s Theorem II: scenarios with arbitrary causal structure. Commun Math Phys. 2016;341(2):391–434.

- 20.↑
Chaves R, Kuen R, Brask J, Gross D. A unifying framework for relaxations of the causal assumptions in Bell’s theorem. Phys Rev Lett. 2015;114:140403.

- 21.↑
Chaves R, Majenz C, Gross D. Information-theoretic implications of quantum causal structures. Nat Commun. 2015;6:5766.

- 22.↑
Tavakoli A, Skrzypczyk P, Cavalcanti D, Acin A. Non-local correlations in the star-network configuration. Phys Rev A. 2014;90:062109.

- 23.↑
Branciard C, Rosset D, Gisin N, Pironio S. Bilocal versus nonbilocal correlations in entanglement swapping. Phys Rev A. 2012;85:032129.

- 24.↑
Leifer MS, Spekkens RW. Formulating quantum theory as a causally neutral theory of Bayesian inference. Phys Rev A. 2013;88:052130.

- 25.↑
Hensen J, Lal R, Pusey MF. Theory-independent limits on correlations from generalised Bayesian networks. New J Phys. 2014;16:113043.

- 26.↑
Pienaar J, Brukner C. A graph-separation theorem for quantum causal models. New J Phys. 2015;17:073020.

- 28.↑
Ried K, Agnew M, Vermeyden L, Janzing D, Spekkens RW, Resch K. A quantum advantage for inferring causal structure. Nat Phys. 2015;11:414.

- 29.↑
Geiger D, Meek C. Quantifier elimination for statistical problems. Proceedings of the 15th Conference on Uncertainty in artificial intelligence.1999.

- 30.↑
Geiger D, Meek C. Graphical models and exponential families. Proceedings of the 14th Conference on Uncertainty in Artificial Intelligence.1998.

- 31.↑
Garcia LD, Stillman M, Sturmfels B. Algebraic geometry of Bayesian networks. J Symbolic Comput. 2005;39(34):331–355.

- 32.↑
Garcia LD. Algebraic statistics in model selection. Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence.2004.

- 33.↑
Kang C, Tian J. Polynomial constraints in causal Bayesian networks. Proceedings of the 23rd Conference on Uncertainty in Artificia Intelligence.2007.

- 34.↑
Tian J, Pearl J. On the testable implications of causal models with hidden variables. Proceedings of the 18th Conference on Uncertainty in Artificia Intelligence.2002.

- 35.↑
Kang C, Tian J. Inequality constraints in causal models with hidden variables. Proceedings of the 22nd Conference on Uncertainty in Artificia Intelligence.2006.

- 36.↑
Garcia-Puente LD, Spielvogel S, Sullivant S. Identifying causal effects with computer algebra. Proceedings of the 26th Conference on Uncertainty in Artificia Intelligence.2010.

- 38.↑
Rosset D, Branciard C, Barnea TJ, Ptz G, Brunner N, Gisin N. Nonlinear bell inequalities tailored for quantum networks. Phys Rev Lett. 2016;116(1):010403.

We now introduce useful concepts and tools from algebraic geometry that we will make use of in solving the problem mentioned in Section 2 of the main text. We will follow the presentation given in Ref. [7].

We define a monomial in

We can now define a polynomial over a field

A polynomial

where the sum is taken over a finite number of

The set of all polynomials in

Let

We call *algebraic variety* (also called the *affine variety*) defined by

Thus, an algebraic variety *basic semi-algebraic set* is defined to be the solution set of a system of polynomial equalities and *in*equalities, that is:

A *basic semi-algebraic set* is defined by ^{8} and where

Note that algebraic varieties are examples of basic semi-algebraic sets.

A *semi-algebraic set* is formed by taking finite combinations of unions, intersections, or complements of basic semi-algebraic sets.

For *any* causal model, the set of possible joint distributions that can be generated by it are represented by a semi-algebraic set. It follows that two causal models are observationally equivalent if and only if they generate the same semi-algebraic set.

We now introduce and define *ideals*, the main algebraic object studied in algebraic geometry.

A subset *ideal* if it satisfies:

,$0\in I$ - If
, then$f,g\in I$ ,$f+g\in I$ - If
and$f\in I$ , then$h\in k[{x}_{1},\dots ,{x}_{n}]$ .$hf\in I$

A natural example of an ideal is the ideal generated by a finite number of polynomials.

Let

It is not hard to show that *ideal generated by**basis* of the ideal.

Studying the relations between certain ideals and varieties forms one of the main areas of study in algebraic geometry. One can even define the algebraic variety

The proof that *varieties are determined by ideals*. This will have interesting consequences for us, as we will see shortly.

To find a general solution to the implicitization problem introduced in the main text we need to introduce monomial orderings and Groebner bases.

First, note that we can reconstruct the monomial

Now, we want the induced ordering to be ‘compatible’ with the algebraic structure of the polynomial ring that our monomials live in. This requirement leads us to the following definition.

A *monomial ordering* on

is a total ordering on$>$ . That is to say that, for every${Z}_{\ge 0}^{n}$ either$\alpha ,\beta \in {Z}_{\ge 0}^{n}$ ,$\alpha >\beta $ or$\beta >\alpha $ .$\alpha =\beta $ - If
and$\alpha >\beta $ , then$\gamma \in {Z}_{\ge 0}^{n}$ .$\alpha +\gamma >\beta +\gamma $ is a well ordering on$>$ . This means that every non-empty subset of${Z}_{\ge 0}^{n}$ has a smallest element under${Z}_{\ge 0}^{n}$ .$>$

The main monomial ordering we will make use of here is the *lexicographic order*, which we define as follows.

[Lexicographic order] Let

Once we fix a monomial order, each *Groebner bases*.

Fix a monomial ordering. A finite subset *Groebner basis* if

More informally, a set *elimination theorem*, which provides us with a way of using Groebner bases to systematically eliminate certain variables from a system of polynomial equations and thereby solve the implicitization problem. We will state the elimination theorem shortly. First, we require the following definition.

Given

Thus

[Elimination theorem] Let

So, in our example with the fan depicted in Figure 1(b) – discussed in the main text –

How do we know that we can extend solutions from the

[Extension theorem] Let

When we work over

We can apply the above theorem to our example to see that, indeed, the equation

depicted in Figure 1(b) in the main text.

Consider the functional causal structure of Figure 7(a). The joint distributions that can arise from it are of the form

The semi-algebraic set defined by *StarFleet insignia*. The Groebner basis for the ideal

The equation

So in order to ensure that

Consider the functional causal structure of Figure 8(a). The joint distributions that can arise from it are of the form

The semi-algebraic set defined by

The Groebner basis for the ideal

with respect to the usual lex order, is given by

The equation

Using

Combining these two inequalities we get

where

Examining this inequality more closely, we see that setting

These examples cover all the different situations one may encounter while using algebraic geometry techniques to derive tests for feasibility of the causal models we are considering in this work. The remaining tests are derived in an analogous fashion.

We here present the proof of Theorem 4.2.1.

The example presented in Section 4.2 suggests a general procedure for replacing an *substitute* variables – the analogues of

We now describe a procedure for replacing an

Recall that for a 3-valued variable, we can take

For a 5-valued variable, we can take

We now construct a causal model underlying

In particular, we can take the first model to be the fiducial model from class

By increasing the number of values that

To construct such a model, we apply the switch-variable construction to a pair of simpler models, one of which has an semi-algebraic set corresponding to an

## Footnotes

^{1}

This should not be confused with the notion of observational equivalence as applied to DAGs [1].

^{2}

Also called an *affine variety* or an *algebraic set*.

^{3}

Note that one can replace the real field *ordered* field.

^{4}

with respect to the lex order

^{5}

with respect to the lexicographic order

^{6}

with respect to the lex order

^{7}

Recall our convention of demanding the probabilities for latent variable to be bounded away from 0 and 1, so that all of our semi-algebraic sets are confined to the interior of the tetrahedron.

^{8}

Note that one can replace the real field *ordered* field.