The Inflation Technique Completely Solves the Causal Compatibility Problem

The causal compatibility question asks whether a given causal structure graph -- possibly involving latent variables -- constitutes a genuinely plausible causal explanation for a given probability distribution over the graph's observed variables. Algorithms predicated on merely necessary constraints for causal compatibility typically suffer from false negatives, i.e. they admit incompatible distributions as apparently compatible with the given graph. In [arXiv:1609.00672], one of us introduced the inflation technique for formulating useful relaxations of the causal compatibility problem in terms of linear programming. In this work, we develop a formal hierarchy of such causal compatibility relaxations. We prove that inflation is asymptotically tight, i.e., that the hierarchy converges to a zero-error test for causal compatibility. In this sense, the inflation technique fulfills a longstanding desideratum in the field of causal inference. We quantify the rate of convergence by showing that any distribution which passes the $n^{th}$-order inflation test must be $O\left(n^{-1/2}\right)$-close in Euclidean norm to some distribution genuinely compatible with the given causal structure. Furthermore, we show that for many causal structures, the (unrelaxed) causal compatibility problem is faithfully formulated already by either the first or second order inflation test.

The causal compatibility question asks whether a given causal structure graph -possibly involving latent variables -constitutes a genuinely plausible causal explanation for a given probability distribution over the graph's observed categorical variables. Algorithms predicated on merely necessary constraints for causal compatibility typically suffer from false negatives, i.e. they admit incompatible distributions as apparently compatible with the given graph. In DOI:10.1515/jci-2017-0020, one of us introduced the inflation technique for formulating useful relaxations of the causal compatibility problem in terms of linear programming. In this work, we develop a formal hierarchy of such causal compatibility relaxations. We prove that inflation is asymptotically tight, i.e., that the hierarchy converges to a zero-error test for causal compatibility. In this sense, the inflation technique fulfills a longstanding desideratum in the field of causal inference. We quantify the rate of convergence by showing that any distribution which passes the n th -order inflation test must be O n −1/2 -close in Euclidean norm to some distribution genuinely compatible with the given causal structure. Furthermore, we show that for many causal structures, the (unrelaxed) causal compatibility problem is faithfully formulated already by either the first or second order inflation test.

INTRODUCTION
A Bayesian network or causal structure is a directed acyclic graph (DAG) where vertices represent random variables, each of which is generated by a non-deterministic function depending on the values of its parents. Nowadays, causal structures are commonly used in bioinformatics, medicine, image processing, sports betting, risk analysis, and experiments of quantum nonlocality. In this work we consider causal structures with two distinct types of vertices: categorical variables which may be directly observed, and variables which cannot be observed, referred to as latent variables. 1 We make no assumption whatsoever on the state spaces of the latent variables; they can be discrete or continuous. Nevertheless, every causal structure encodes a possible hypothesis of causal explanation for statistics over its observed variables.
Naturally, understanding how different causal structures give rise to different sets of compatible distributions is a fundamental goal within the field of causal inference. Many prior works are ultimately concerned with the causal discovery problem, which asks to enumerate (or graphically characterize) all legitimate hypotheses of causal structure which are capable of explaining some observed probability distribution [2][3][4][5][6][7][8][9]. For computational tractability, practical causal discovery algorithms typically exclude causal explanations which are unfaithful (fine-tuned). Fundamentally, however, the faithfulness assumption is not an essential criterion for causal discovery. Demanding faithfulness can be thought of as a second filtering step, where the fundamental filtering of 1 Pearl [1] refers to such graphical models as "latent structures". causal discovery is the exclusion of any causal structure which cannot explain the observed probability distribution, even granting fine-tuning. In this manuscript, therefore, causal discovery refers to the foundational problem of returning all causal structures compatible with the given distribution. Selecting a single "best" causal model -or even scoring the quality of the different causal explanations [8,9] -constitute refinements to the causal discovery problem which we do not address here.
We must note that the causal characterization problem has also been tackled in scenarios where the state-spaces of the latent variables are prescribed [35][36][37][38]. Critically, Ref. [39] provides upper bounds on the cardinalities of a causal structure's latent variables without any loss of generality (whenever the observed variables have discrete state spaces). Consequently, the set of multi-variate categorical distributions compatible with any given causal structure is always a semi-algebraic set, admitting characterization in terms of a finite number of polynomial equality and inequality constraints. Nevertheless, identifying the full set of causal compatibility constraints via exploiting the constrained state-spaces of the latent variables is often intractable [35][36][37][38]. The inflation technique considered herein, by contrast, has no dependence on the latent variables' state-spaces. Hereafter, therefore, we consider only causal structures with unconstrained latent variables.
Causal discovery relates a single distribution to many structures; causal characterization relates many distributions to a single structure. Both such efforts, therefore, are oracle-wise equivalent, and hinge fundamentally on the causal compatibility problem (CCP), which simply asks a yes-or-no question: Is the given distribution compatible with the given causal structure? The inflation technique [40] is a way of relating approximations of the causal compatibility problem to linear programming (LP) problems. Every LP satisfaction problem can be dualized and recast as an equivalent optimization problem. Inspired by LP duality, we will formulate a dual notion of causal compatibility, through which we will be able to rigorously upper-bound the error introduced by approximating the CCP as an LP problem via inflation. Our main result here is that this error asymptotically tends to zero when inflation is expressed as a hierarchy of ever-higher-order tests of causal compatibility. This implies that the inflation criterion -far from being a relaxation -is meta-equivalent to the causal compatibility problem, and hence constitutes an alternative way of understanding general causal structures.
In contrast with Ref. [40], in this paper we define the inflation technique as a hierarchy of causal compatibility tests applicable exclusively to the special class of causal structures (introduced by Fritz [19]) called correlation scenarios. At the same time, however, we also introduce a graphical preprocessing which precisely recasts the general causal compatibility problem in terms of causal compatibility with correlation scenarios, such that there is no loss of generality in our approach. This paper is organized as follows: In Section 2 we introduce the concept of a correlation scenario, we define primal and dual notions of the causal compatibility problem and their approximations. In Section 3 we review the inflation technique as a means for approximately solving either form of the causal compatibility problem. In Section 4 we state our main theorems concerning the convergence of inflation for correlation scenarios, though the formal proofs are deferred to the appendices. In Section 5 we build upon existing causal inference techniques to describe a natural graphical preprocessing which maps general causal structures into correlation scenarios. This preprocessing -fairly useful in its own right -implies the universal applicability of the inflation technique as defined here. Finally, Section 6 presents our conclusions.

PRELIMINARY DEFINITIONS
The graphical models we study are fully general causal structures. A causal structure is represented by a directed acyclic graph imbued with some distinction among the vertices to clarify if a node in the graph represents either an observable or a latent variable. In this work, we use a pink color and subscripts of the letter "U " ("U " from "Unobserved") to indicate the latent variables in a graph. We follow the convention of Refs. [14,18,41] and depict exogenous (i.e., non-root) observable variables as squareshaped nodes in their graphs. Correlation scenarios are a special type of causal structures. The graph of a correlation scenario has just two layers: a bottom layer of independently distributed latent random variables {U 1 , U 2 ,..., U L } and a top layer of observable random variables {A 1 , A 2 ,..., A m }, see Fig. 1.
Here (and in the following) the notationv S , wherev is a vector with N entries and S ⊂ {1,..., N }, will represent the vector with entries v s , s ∈ S. Readers familiar with d-separation may appreciate that although the impli- does not hold for general causal structures, it is true for correlation scenarios. Later on in Section 5, we will relate distributions over general causal structures to distributions over some correlation scenarios associated with them. As we will see, correlation scenarios are the atomic constituents upon which the inflation hierarchy acts.

The Causal Compatibility Problem, its dual and their approximate versions
A distribution P over the observable variables of a causal structure G is said to be compatible with G if P is the observable marginal of some distribution P over all the variables of G, and where P can be factored into a product of singleton-variable conditional probability distributions associated with every individual vertex in G (conditioned on all of the vertex's parents, if any). A diverse vocabulary of phrases synonymous with "P is compatible with G" can be found in conventional literature, such as "P can be realized in G", "P can arise from G", "G gives rise to P ", "G explains P ", "G can simulate P ", and "G is a model for P ". Consider, for example, the correlation scenario dubbed the triangle scenario, with m = L = 3, see For causal structures which are not correlation scenarios, however, the non-deterministic functions giving rise to the observed variables will also depend on other observed variables. An example is given by the instrumental scenario [33,42] (Fig. 4, up), where X and U are, respectively, a free observable and a latent variable, and the observed variables Per Ref. [39], the set of distributions P compatible with a given causal structure G is a semi-algebraic set whenever the observable random variables are categorical, i.e., when they have finite cardinality. This implies that it can be characterized in terms of a finite number of polynomial inequalities. Unfortunately, the computational complexity of deriving such a characterizing set of inequalities makes the problem intractable already for networks of a very small size [43]. Furthermore, within the context of quantum foundations, there exist fairly natural causal structures for which the total number of such inequalities grows exponentially with the dimensionality of P [44]. We must resort thus to partial characterizations of the original set of distributions. This notion is better formalized by the following problem.

Problem 1. Approximate Causal Compatibility
Input: > 0, a causal structure G and a particular probability distribution P over the observed variables. Output: If there does not exist a probability distributioñ P over the observed variables, such that P −P 2 ≤ andP is compatible with G, then return a function F such that F (P ) < 0 and F (P ) ≥ 0 for all distributionsP compatible with G. Objective: Determine if P is "approximately compatible" with G; if not, provide a witness F to prove incompatibility of P .
Note that, if P is not approximately compatible with G, the function F witnessing its incompatibility is not required to be universal. Namely, there could exist other distributions P incompatible with G such that F (P ) ≥ 0.
The goal of this paper is to provide a solution to this problem. Note that, since for any G the set of compatible distributions is closed [39], it follows that any distribution P that is -compatible for all > 0 must be compatible with G. The analog problem for = 0 will be simply referred to as Causal Compatibility.
A related problem that we will also solve is the following:

Problem 2. Approximate Causal Optimization
Input: > 0, a causal structure G and a real function F of the probabilities of the observed events.

Output:
A This problem is dual to Approximate Causal Compatibility, and it is interesting in its own right. In quantum optics experiments, we test and quantify non-classicality via the violation of inequalities of the form F (P ) ≥ K. Identifying values of K for which the above holds for all distributions P compatible with the considered causal structure G is a must before any experiment is actually carried out. Similarly as before, for = 0, we name the analog problem Causal Optimization.
Coming back to the triangle scenario, an instance of Approximate Causal Optimization would be minimizing compatible with the triangle scenario. In any experimental setup where bipartite optical sources play the role of U 1 , U 2 , U 3 in the triangle scenario, any observed distribution P (A, B, C) for which the value of (3) is smaller than the lower bound f provided by Approximate Causal Optimization evidences the presence of quantum effects. There exist a number of algorithms which provide outer approximations for the set of distributions compatible with a given causal structure G [3,14,25,26,41,45,46]. By minimizing functions over such outer approximations, existing algorithms can provide lower bounds on the true minimum and thus solve Approximate Causal Optimization, as long as exceeds some threshold (determined by the mismatch between the aforementioned relaxations and the original set of compatible distributions). As we will see, the Inflation Technique can be used to solve both Approximate causal compatibility and Approximate Causal Optimization for arbitrarily small values of .

Some examples
Let P (A, B, C) be a distribution realizable in the triangle scenario, and suppose that we generate n independently distributed copies of Then we could define the random variables The causal structure associated with the independently distributed copies of U 1 , U 2 , U 3 and their observable children The inflation graph of a correlation scenario is also a correlation scenario; as an example, Fig. 5 depicts the n = 2 inflation graph for the triangle scenario. These observable variables follow a probability distribu- for all permutations of n elements π, π , π . Expanded out for n = 2, Eq. (5) becomes We treat with special distinction the diagonal variables {A ii , B ii , C ii } i . Given the global distribution Q n , we denote by Q g n the marginal distribution of the diagonal variables with indices up to g ≤ n, i.e.
In the following, we call Q g n the diagonal marginal of degree-g.
A related concept is the degree-g lifting of a distribution P , consisting of the statistics of g independent and identically distributed copies of P , that is Taking the random variables in the inflation graph to arise per Eq. (4) implies that the diagonal marginals associated with the inflation graph must be related to the lifted distributions of the original distribution over observed variables per Q g n = P ⊗g , for g = 1,..., n.
Expanded out for g = n = 2, Eq. (7) identifies the diagonal marginal in this scenario Note that there exist additional relations between Q n and the original distribution P (A, B, C), some of which involve polynomials of the probabilities P (A, B, C) with degree greater than n. For instance, In this paper we will not exploit such higher degree relations, though they are quite useful in practical implementations. 2 Figure 5. The second-order inflation graph of the triangle scenario.
Given an arbitrary distribution P (A, B, C), the inflation technique consists in demanding the degree-n lifting of P (A, B, C) be the degree-n diagonal marginal of a distribution Q n over the inflated variables satisfying (5). When condition (7) is met, we call the associated distribution Q n an n th -order inflation of P . Clearly, if P (A, B, C) does not admit an n th -order inflation for some n, then it cannot be realized in the triangle scenario. Deciding if the degree-n lifting of P (A, B, C) is a member of the set of degree-n diagonal marginals can be cast as a linear program [47].
If the linear program is infeasible, i.e., if no n th -order inflation exists for P (A, B, C), then the program will find a witness to detect its incompatibility. Such a witness will be of the form where F is a real vector and the minimum on the right hand side is taken over all distributions Q n satisfying Eq. (5). Call F the n th -degree polynomial such that F (Q) = F · Q ⊗n for all Q. For some distributions P , the inflation technique will thus output a polynomial witness of incompatiblity F , hence solving the corresponding (Approximate) Causal Compatibility problem. Note that, for n ≥ n , any distribution P admitting an n th -order inflation also admits an (n ) th -order inflation. This suggests that we might be able to detect the incompatibility of a distribution P via the inflation technique just by taking the order n high enough.
Since any polynomial of a probability distribution can be lifted to a linear function acting on g-liftings, we can also use the inflation technique to attack Approximate Causal Optimization, as long as the function F to minimize happens to be a polynomial. Suppose that this is the case and that F has degree g. We wish to minimize F (P ) over all distributions compatible with the triangle scenario. Our first step would be to express F as a vector F , such that F (P ) = F · P ⊗g , for all distributions P . Our second step consists in solving the linear program where Q g n is defined by Eq. (6), and such that Q n is a distribution satisfying condition (5).
Since the g-lifting of any distribution P compatible with the triangle scenario can be viewed as the diagonal marginal of degree g of a distribution Q n satisfying (5), we thus have that f n ≤ f = min P F (P ), for all n, just as in the definition of Approximate Causal Optimization. Moreover, f n ≥ f n , for n ≥ n , i.e., as we increase the order n of the inflation, we should expect to obtain increasingly tighter lower bounds on f . If, by whatever means, we were to obtain an upper bound f + , then we would have solved Approximate Causal Optimization for In the triangle scenario, the inflation technique can therefore be used to tackle both Approximate Causal Compatibility and Approximate Causal Optimization.
For further elucidation, consider another correlation scenario. In the three-on-line scenario, Fig. 2, we again have three random variables A, B, C which are defined, respectively, via the non-deterministic functions A(U 1 ), B(U 1 , U 2 ), C(U 2 ). As always, the exogenous latent variables {U 1 , U 2 } are independently distributed. The n = 2 inflation graph for the three-on-line scenario is depicted in Fig. 6.
In this scenario, an n th -order inflation corresponds to a distribution Q n over the variables {A i }, {B jk }, {C l } , where i, j, k, l range from 1 to n. Q n must satisfy the linear constraints: for all permutations of n elements π, π . Expanded out for n = 2, Eq. (12) becomes a 2 , b 11 , b 12 , b 21 , b 22 , c 1 , c a 1 , b 21 , b 22 , b 11 , b 12 , c 1 , c 2 Additionally, relating the degree-g liftings of P to the diagonal marginal in this scenario requires for any choice of integer g such that g ≤ n. Expanded out for g=n=2, Eq. (14) identifies the diagonal marginal for this scenario Q g=2 n=2 = B 12 ,B 21 Q n=2 as Figure 6. The second-order inflation graph of the threeon-line scenario.
The above ideas are easy to generalize to arbitrary correlation scenarios (remember, though, that correlation scenarios are just a special class of causal structures).

Inflation of an Arbitrary Correlation Scenario
To set up the n th -order inflation of an arbitrary correlation scenario, first imagine n independent copies of all the latent variables, and then consider all the observable variables which are children of these, following the prescription of the original correlation scenario. Each observable variable in the inflation graph has as many superindices as latent variables it depends on. Then, one must impose symmetry restrictions on the total probability distribution Q n , demanding that it be invariant under any relabeling-permutations applied to the index of any one latent variable, i.e., The central object we consider, then, is the set of all diagonal marginals consistent with such an n th -order inflation. We denote such a generic diagonal marginal by The compatibility conditions require the degree-g lifting of P to be consistent with such a degree-g diagonal marginal.
Notice that any distribution Q n subject to the constraints (16)(17)(18)(19) must be such that the marginals associated with relabellings of the indices of the diagonal variables obey the same compatibility conditions as the canonical diagonal marginals do, i.e., for allπ. Actually, the original description of the inflation technique in Ref. [40] imposes the constraints (20) rather than (16)(17)(18)(19) over the distribution Q n , as demanding the existence of a distribution Q n satisfying condition (20) can be shown to enforce over P (a 1 ,..., a m ) exactly the same constraints as demanding the existence of a distribution satisfying (16)(17)(18)(19). Indeed, as noted in Ref. [40,App. C], any distribution Q n satisfying (20) can be twirled or symmetrized (see Appendix) to a distributionQ n satisfying Eqs. (16)(17)(18)(19). For convenience, from now on we will just refer to the formulation of the inflation technique involving the symmetries (16). This formulation has the added advantage that the symmetry constraints can be exploited to reduce the time and memory complexity of the corresponding linear program, see for instance Ref. [48].
It isn't hard to see how this general notion of inflation can also be used to tackle Approximate Causal Compatibility and Approximate Causal Optimization in general correlation scenarios G:

Problem 3. Inflation for Causal Compatibility
Input: A positive integer n, a causal structure G and a particular probability distribution over the observed variables P .
Primal Linear Program: where P relates to Q n by Eqs. (18,19), and such that Q n satisfies conditions (16,17).

(21)
Dual Linear Program: where Q g n is defined by Eq. (18), and such that Q n satisfies conditions (16,17). (22) Summary: If the degree-n lifting of P is not in the set of degree-n diagonal marginals consistent with an n thorder inflation of G, then the returned dual variable F will witness the incompatibility of P per F · P ⊗n < 0 while F · P ⊗n ≥ 0 for all distributions P compatible with G. Similarly,

Problem 4. Inflation for Causal Optimization
Input: A positive integer n, a causal structure G and a degree-g polynomial function F of the probabilities of the observed events.
Linear Program: where Q g n is defined by Eq. (18), and such that Q n satisfies conditions (16,17).

(23)
Summary: The programs returns a degree-g diagonal marginals consistent with an n th -order inflation of G which minimizes the input function F . Since such diagonal marginals contain all degree-g liftings of distributions compatible with G, it follows that f n is a lower bound on the minimum value of F over all distributions compatible with G.
In certain practical cases, we may not know the full probability distribution of the observable variables, but only the probabilities of a restricted set E of observable events. As we will see in Section 5, this often happens when we map the causal compatibility problem from a general causal structure to a correlation scenario. To apply the inflation technique to those cases, rather than fixing the value of all probability products, like in Eq. (19), we will impose the constraints a 1 ∈e 1 ,..., a n ∈e n for all e 1 ,..., e n ∈ E. Any distribution Q n satisfying both (16) and (24) will be dubbed an n th order inflation of the distribution of observable events.
For example, consider again the three-on-line scenario (Fig. 2), and assume that our experimental setup just allows us to detect events of the form e(a) ≡ {(A, B, C) : A = B = C = a}. Then our set of observable events is E = ∪ a {e(a)} and the input of the causal inference problem is the distribution {P (e), e ∈ E}. An n th order inflation Q n of P (e) would satisfy Eq. (12) and the linear conditions

CONVERGENCE OF INFLATION
The main result of this article is that the inflation technique can be used to solve Approximate Causal Compatibility and Approximate Causal Optimization for arbitrarily small values of , just by taking the order n of the inflation high enough. Depending on which of the two problems we wish to solve and which causal structures are involved, we will have either finite-order convergence or asymptotic convergence.

On finite-order convergence
Even at low orders, the Inflation Technique has been shown to provide very good outer approximations to the set of distributions compatible with the triangle scenario [40]. Furthermore, for certain correlation scenarios, a second-order inflation can be shown to fully characterize the set of compatible distributions.
Consider, for instance, the three-on-line scenario (Fig. 2), whose second-order inflation was depicted in Fig. 6. Note that condition (15) implies that and condition (13b) implies that From the last condition, it follows that Q n=2 (A 1 =a, C 1 =c) = Q n=2 (A 1 =a, C 2 =c).
Invoking condition (26), we thus have that Q n=2 (A 1 =a 1 , C 1 =c 1 ) = P (A=a 1 )P (C=c 1 ). (28) This is sufficient to ensure that Q g=1 n=2 is realizable in the three-on-line scenario, since then P (A, B, C) = P (A, C)P (B|A, C) = P (A)P (C)P (B|A, C). This last expression represents a realization of P (A, B, C) in the three-on-line scenario, where the hidden variables U 1 , U 2 are, respectively, A and C. This example can be generalized to prove convergence at order n=2 of any star-shaped correlation scenario. Starshaped scenarios with N observable variables have the defining property that, in some subset of N −1 observable variables, every pair of variables share no latent parents [28,29], see Figs. 2 and 7. (This definition assumes that every set of variables in a correlation scenario all of which have the same set of latent parents are implicitly merged into a single vector-value variable.) Given an arbitrary star-shaped correlation scenario with N random variables, call B 1 ,..., B N −1 any set of N −1 random variables without a common ancestor; and A, the remaining variable. Using the same trick as before, one can prove that, for any i =j, P (B i , B j ) = P (B i )P (B j ). Similarly, one can group variables B i , B j and argue that, for any l =i, j, Iterating this argument, we show that P (B 1 ,..., B N −1 ) factors into N −1 products. Analogously, it is proven that P (A, B i1 ,..., B im ) = P (A)P (B i1 ,..., B im ), for any set of indices i 1 ,..., i m such that B i1 ,..., B im do not share parents with A. This is enough to prove compatibility.
In these examples, using the inflation technique is an overkill, as compatibility can be determined solely by checking the satisfaction of all independence relations. There are many correlation scenarios, however, where distribution compatibility is also determined by inequality constraints. Examples of such "interesting" 3 correlations scenarios include the triangle scenario, as well as the four-on-line scenario depicted in Fig. 8. And actually, in the former scenario, the problem of compatibility of distributions is not completely solved by second-order inflation. 3 Here we use "interesting" in precisely the meaning of Refs. [31,32]. Figure 8. The four-on-line correlation scenario.
Indeed, all binary variable distributions of the form This bound is strictly tighter than bounds implied by the Finner inequality [49] or the semidefinite causal compatibility constraints involving covariances of Refs. [25,50] throughout the parameter region 0.0283 r ≤ q. For arbitrary correlation scenarios G with observed variables of specified cardinality, we inquire whether some finite-order inflation is always sufficient to characterize the set of compatible distributions. Are there causal structures for which inflation converges only asymptotically? Could the triangle scenario be such an example?
Open Question. For any correlation scenario G, does there exist n such that n th -order inflation solves exact Causal Compatibility?
Interestingly, we can prove that, when used to solve Approximate Causal Optimization, the inflation technique does not converge, in general, in a finite number of steps. Indeed, consider the trivial correlation scenario consisting of a single observable variable A and its single latent parent U . We wish to use the inflation technique to minimize the function −P (A=0)P (A=1). Clearly the solution of this problem is − 1 4 . An n th -order function inflation assessment (starting at n ≥ 2), however, would effectively reduce this problem to the LP min Qn − Q g=2 n (0, 1) ≡ − a 3 ,...,a n Q n (0, 1, a 3 ,..., a n ) s.t.
For n = 2n , consider the symmetric probability distribution Q n given by randomly choosing without replacement n bits a 1 ,..., a n from a pool of n 0's and n 1's. Then it can be verified that − a 3 ,...,a n Q n (0, 1, a 3 ,..., a n ) = − 1 2 overshooting the magnitude of the true minimum for all n . Nonetheless, note that the above quantity converges to the correct result of − 1 4 asymptotically as O(1/n).

Asymptotic convergence
In this section we will prove that, for any correlation scenario G, the inflation technique characterizes the set of compatible correlations asymptotically. More precisely, we will show that any distribution P admitting an n th order inflation is O(1/ √ n)-close in Euclidean norm to a compatible distributionP . Similarly, we will show that f n , as defined in Eq. (11), satisfies f − f n ≤ O(1/n). In order to solve Approximate Causal Compatibility and Approximate Causal Optimization for a given value of , we just have to use the Inflation Technique up to orders O(1/ 2 ), O(1/ ), respectively. Since the set of compatible distributions is closed [39], this implies that, for any incompatible distribution P , there exists n such that P does not admit an n th order inflation.
Before we proceed with the proof, a note on the scope of our results is in order. The inflation technique is fairly expensive in terms of time and memory resources. At order n, it involves optimizing over probability distributions of m i=1 n |Li| variables (we remind the reader that L i ⊂ {1, ..., L} denotes the set of indices j such that the hidden variable U j influences A i ). If each of these variables can take d possible values, then the number of free variables in the corresponding linear program is That is, the memory resources required by the inflation technique are superexponential on n. Add to this the fact that the best LP solvers in the market have a running time of O(N 3 ) [51], and you will come to the conclusion that a brute-force implementation of the inflation technique in the triangle scenario is already unrealistic for n = 4, even in the simplest case of d = 2. What is the relevance, then, of proving asymptotic convergence?
For us, it is a matter of principle. Even at low orders, the inflation technique has proven itself very useful at identifying non-trivial constraints on observable probability distributions. It is therefore natural to ask whether the inflation technique just provides a partial characterization of compatibility, or, on the contrary, it reflects an alternative way of comprehending the latter. Our work settles this question completely: by proving that any unfeasible distribution must violate one of the inflation conditions, we refute the first hypothesis and validate the second one.
The key to deriving the asymptotic convergence of the Inflation Technique is the following theorem, proven in the Appendix. Theorem 1. Let G be a correlation scenario with L latent variables, and let Q g n be the degree-g diagonal marginal of a distribution Q n satisfying the symmetry conditions (16). Then, there exist normalized probability distributions P µ compatible with G and probabilities p µ ≥ 0, µ p µ = 1 such that where D(q, r) = x |q(x) − r(x)| denotes the total variation distance between the probability distributions q(X), r(X).
This theorem can be regarded as an extension of the finite de Finetti theorem [52], that states that the marginal P (a 1 ,..., a g ) of a symmetric distribution P (a 1 ,..., a n ) is O(g 2 /n)-close in total variation distance to a convex combination of degree-g liftings.
The solvability of Approximate Causal Optimization through the Inflation Technique follows straightforwardly from Theorem 1. Let F be a polynomial of degree g, with f = max P F (P ), and let Q n be the symmetric distribution achieving the value f n in Eq. (11). Then, by the previous theorem, we have that It follows that Proving the analog result for Approximate Causal Compatibility is only slightly more complicated. Let P be a probability distribution over the observed variables, and suppose that P admits an n th -order inflation Q n . Define the second-degree polynomial N (R) = ā (R(ā) − P (ā)) 2 , and let N be a linear functional such that N · q ⊗2 = N (q) for all distributions q. Note that, due to conditions (19), N ·Q 2 n = N (P ) = 0. Thus the minimum value f n of N ·Q 2 n over all diagonal marginals of degree 2 of a distribution Q n satisfying the symmetry conditions (16) is such that f n ≤ 0. Invoking Eq. (35) for g = 2, we have that where f is the minimum value of N (Q) over all compatible distributions Q. This implies that there exists a compatible distributionP such that This proof of convergence easily extends to scenarios where we only know the probabilities of set E of observable events. Indeed, choosing the polynomial N such that N (R) = e∈E (P (e) − R(e)) 2 , and following the same derivation as in Eq. (36), we conclude that a distribution of observable events admitting an n th order inflation is O L n -close in Euclidean norm to a compatible distribution.

UNPACKING CAUSAL STRUCTURES
So far we have just been referring to correlation scenarios, i.e., those causal structures where all observed variables only depend on a number of independent latent variables. However, in a general causal model, the value of a given variable can depend, not only on latent variables, but also on the values of other observed variables. In the following, we define procedures call exogenization and unpacking which cumulatively map the problem of causal compatibility with an arbitrary causal structure to problems of causal compatibility with the structure's implicit constituent correlation scenarios. Consequently, these procedures enable application of the inflation technique to general causal structure via preprocessing into correlation scenarios.
Suppose G is a DAG with latent variables. If U is an endogenous (non-root) latent variable in G, one can exogenize U by first adding all possible directed edges originating from a parent of U and terminating at a child of U , and then deleting from G all directed edges which terminated at U . The resulting graph admits precisely the same set of feasible observed distributions as G, per Ref. [53,Sec. 3.2]. Hereafter, therefore, we restrict our attention to causal compatibility problems involving causal structures where all latent variables are exogenous.
In addition, we will always consider distributions as implicitly conditional on the values of any exogenous observable variables. Of course, this mapping from raw probability distributions to conditional probability distributions only makes sense if the distribution of exogenous observable variables factorizes, i.e., if all exogenous observable variables are independent from each other. As an example, the sorts of distributions we consider for the Bell scenario depicted in Fig. 9 are of the form P (A, B|X, Y ), as opposed to P (A, B, X, Y ).
To go from general causal structures to correlation scenarios, we introduce counterfactual variable sets, in which we consider all the different ways a variable can respond to its observable parents as distinct variables. We call the procedure for eliminating all dependencies between observed variables unpacking. Unpacking is related to -but distinct from -the single world intervention graphs introduced in Ref. [17] and the e-separation method developed in Ref. [16]. As quantum physicists, we understand unpacking as a manifestation of counterfactual definiteness, which is a natural assumption mysteriously inconsistent with quantum theory [54][55][56]. Since counterfactual definiteness does hold in the "classical" causal models considered in this paper, we promote maximally exploiting this assumption as a first step towards resolving any causal compatibility problem.
By way of example, consider the structure G 1 depicted in Fig. 10. The correlation scenario which results from unpacking G 1 -assuming that all observable variables are discretely valued in the range [0, 1] -is shown in Fig. 11. The unpacked scenario can be though of as having either seven binary-valued variables {A X=0 , A X=1 , B, C A=0,B=0 , C A=0,B=1 , C A=1,B=0 , C A=1,B=1 } or simply three variables, two of which are vector valued. We use the latter interpretation for the visualization of the unpacked scenario, but the former interpretation is convenient to explicitly relate the packed distributions to the unpacked distributions. A distribution over the observable variables in G 1 (conditioned on the exogenous observable variable X) is compatible with G 1 iff there exists another distribution compatible with G 2 (over G 2 's observable variables) such that the first distribution is recovered via suitable varying marginals of the latter. Explicitly,  Figure 10. The example structure G 1 .
(Not a correlation scenario.) Figure 11. The unpacking of the example structure G 1 , which we denote by G 2 .
on the value of c, however. We now describe how to unpack an arbitrary causal structure. Let A be an observed variable. We denote the observable parents of A as paOBS [ The probabilities of the observed variables in G can be obtained from the probabilities of a set of measurable events in the associated correlation scenario G . To be clear, letĀ (X) denote all the observable endogenous (exogenous) variables in G. Then, where {ā,x} Ai denotes selecting those elements out of the setā ∪x which corresponding to the values of paOBS[A i ].
The original Approximate Causal Compatibility (Approximate Causal Optimization) problem in G is thus mapped to an Approximate Causal Compatibility (Approximate Causal Optimization) problem in the correlation scenario G , with a non-trivial set of observable events. The inflation technique can then be applied on G to solve either problem on the original structure G up to arbitrary precision .
Note that unpacking can be valuable even without further inflation. For instance, unpacking the instrumental scenario of Fig. 4 leads to an associated correlation scenario which is trivial, such as depicted in Fig. 12. Any distribution over the four variables {A X=0 , A X=1 , B A=0 , B A=1 } is compatible with Fig. 12. Nevertheless, demanding that the distributions P (A, B|X) admit such an unpacking leads to nontrivial constraints. We can formulate the admission of an unpacked distribution as a linear program, via Eq. (39). Explicitly formulating this linear program for the instrumental scenario looks like and leads to the famous instrumental inequalities [42], such as P (A=0, B=0|X=0) + P (A=0, B=1|X=1) ≤ 1. (40) This example substantially generalizes. Unpacking alone also completely solves the causal compatibility problem for any single-district causal structure (see [18] for a definition) containing one (or fewer) latent variables. This includes all Bell scenarios and the entire hierarchy of their relaxations as described in Ref. [57].
Furthermore, one can take advantage of known results concerning observationally equivalent causal structures. We say that G and G are observationally equivalent whenever both structures admit precisely the same set of compatible distributions over their observed variables. Prop. 5 in Ref. [53] is a prescription for replacing latent variables with sets of directed edges while preserving observational equivalence. We encourage aggressive application of that prescription in order to convert (unpacked) U A X=0 A X=1 B A=0 B A=1 Figure 12. The unpacking of the instrumental scenario. correlation scenarios into observationally equivalent structures which can be unpacked further. For instance, it can be invoked to convert the four-on-line correlation scenario of Fig. 8 into the observationally equivalent Bell scenario of Fig. 9, to convert the three-on-line correlation scenario of Fig. 2 into the observationally equivalent graph A → B ← C with no latent variables, or to replace all latent variables in star scenarios such as Fig. 7 with inwards-pointing directed edges. Interestingly, all the challenging causal structures collected in Fig. 14 of Ref. [53] unpack to the four-on-line correlation scenario. One can readily demonstrate the non-saturation of those structures by converting their unpacked forms to the Bell scenario à la Prop. 5 of Ref. [53], and then unpacking a second time.
Of course, unpacking supplemented with inflation is far more powerful than unpacking alone. Unpacking and inflation are both naturally formulated as linear programs, and hence can be easily combined into a single composite linear program to solve Causal Compatibility or Causal Optimization (over polynomials of conditional distributions).

CONCLUSIONS
The inflation technique was first proposed by Wolfe et al. [40] as a means to obtain strong causal compatibility constraints for arbitrary causal structures. Here, we have formulated inflation as a formal hierarchy of problems for assessing causal compatibility relative to correlation scenarios. We have proven the inflation hierarchy to be complete, in the sense that any distribution incompatible with a given correlation scenario will be detected as incompatible by inflation. More quantitatively, we showed that any distribution P passing the n th -order inflation test is O 1 √ n -close in Euclidean norm to some other distribution which can be realized within the considered scenario.
The inflation technique is fully applicable to any causal structure, since unpacking allows one to map any causal assessment problem (for either distributions or functions) to an equivalent assessment problem relative to a correlation scenario. The observed distribution in the original structure is mapped to probabilities pertaining to restricted sets of measurable events in the unpacked correlation scenario. Since, however, our proof of the convergence of inflation allowed for restricted sets of measurable events, the convergence theorems are still applicable when using inflation to assess compatibility relative to general causal structures.
We have therefore shown that the inflation technique is much more than a useful machinery to derive statistical limits; it is an alternative way to define causal compatibility! For the purpose of practical causal discovery, we envision the inflation technique being used as final refinement. That is, inflation (and unpacking) should be employed as a postprocessing, after first filtering the set of candidate causal explanations by means of computationally-cheaper but less-sensitive algorithms. Indeed, our attitude concerning the primacy of single-district graphs reflects our implicit assumption that all the kernels of a multi-district graph will have been identified. In other words, whatever distribution is being assessed for causal compatibility via inflation, we are presuming that it has already been verified to satisfy the nested Markov property (NMP) relative to the considered graph [14,41]. Thus, we envision testing for compatibility via inflation only after first testing for compatibility via NMP algorithms. This is not strictly necessary, as our results here imply that the inflation technique alone can recover all the constraints implied by NMP, though we imagine it is relatively inefficient to impose NMP only indirectly through inflation.
Alternatively, inflation could be used to estimate the distances of a distribution P from the sets of distributions compatible with various causal structures. We speculate that such distances could prove valuable in helping compute scores for the ranked causal discovery problem [8,9], though we defer further analysis to future research.