Skip to content
BY 4.0 license Open Access Published by De Gruyter November 5, 2019

Unifying Gaussian LWF and AMP Chain Graphs to Model Interference

  • Jose M. Peña EMAIL logo

Abstract

An intervention may have an effect on units other than those to which it was administered. This phenomenon is called interference and it usually goes unmodeled. In this paper, we propose to combine Lauritzen-Wermuth-Frydenberg and Andersson-Madigan-Perlman chain graphs to create a new class of causal models that can represent both interference and non-interference relationships for Gaussian distributions. Specifically, we define the new class of models, introduce global and local and pairwise Markov properties for them, and prove their equivalence. We also propose an algorithm for maximum likelihood parameter estimation for the new models, and report experimental results. Finally, we show how to compute the effects of interventions in the new models.

1 Motivation

Graphical models are among the most studied and used formalisms for causal inference. Some of the reasons of their success are that they make explicit and testable causal assumptions, and there exist algorithms for causal effect identification, counterfactual reasoning, mediation analysis, and model identification from non-experimental data [16], [20], [23]. However, in causal inference in general and graphical models in particular, one assumes more often than not that there is no interference, i. e. an intervention has no effect on units other than those to which the intervention was administered [28]. This may be unrealistic in some domains. For instance, vaccinating the mother against a disease may have a protective effect on her child, and vice versa. A notable exception is the work by Ogburn and VanderWeele [11], who distinguish three causal mechanisms that may give rise to interference and show how to model them with directed and acyclic graphs (DAGs). In this paper, we focus on what the authors call interference by contagion: One individual’s treatment does not affect another individual’s outcome directly but via the first individual’s outcome. Ogburn and VanderWeele [11] also argue that interference by contagion typically involves feedback among different individuals’ outcomes over time and, thus, it may be modeled by a DAG over the random variables of interest instantiated at different time points. Sometimes, however, the variables are observed at one time point only, e. g. at the end of a season. The observed variables may then be modeled by a Lauritzen-Wermuth-Frydenberg chain graph, or LWF CG for short [5], [7]: Directed edges represent direct causal effects, and undirected edges represents causal effects due to interference. Some notable papers on LWF CGs for modeling interference by contagion are Ogburn et al. [12], Shpitser [21],[1] Shpitser et al. [22], and Tchetgen et al. [27]. For instance, the previous mother-child example may be modeled with the LWF CG in Figure 1 (a), where V1 and V2 represent the dose of the vaccine administered to the mother and the child, and D1 and D2 represent the severity of the disease. This paper only deals with Gaussian distributions, which implies that the relations between the random variables are linear. Note that the edge D1D2 represents a symmetric relationship but has some features of a causal relationship, in the sense that a change in the severity of the disease for the mother causes a change in severity for the child and vice versa. This seems to suggest that we can interpret the undirected edges in LWF CGs as feedback loops. That is, that every undirected edge can be replaced by two directed edges in opposite directions. However, this is not correct in general, as explained at length by Lauritzen and Richardson [8].

Figure 1 Models of the mother-child example: (a, b) LWF CGs, and (c) UCG.
Figure 1

Models of the mother-child example: (a, b) LWF CGs, and (c) UCG.

In some domains, we may need to model both interference and non-interference. This is the problem that we address in this paper. In our previous example, for instance, the mother and the child may have a gene that makes them healthy carriers, i. e. the higher the expression level of the gene the less the severity of the disease but the load of the disease agent (e. g., a virus) remains unaltered and, thus, so does the risk of infecting the other. We may model this situation with the LWF CG in Figure 1 (b), where G1 and G2 represent the expression level of the healthy carrier gene in the mother and the child. However, this model is not correct: In reality, the mother’s healthy carrier gene protects her but has no protective effect on the child, and vice versa. This non-interference relation is not represented by the model. In other words, LWF CGs are not expressive enough to model both interference relations (e. g., intervening on V1 must have an effect on D2) and non-interference relations (e. g., intervening on G1 must have no effect on D2). To remedy this problem, we propose to combine LWF CGs with Andersson-Madigan-Perlman chain graphs, or AMP CGs for short [1]. We call these new models unified chain graphs (UCGs). As we will show, it is possible to describe how UCGs exactly model interference in the Gaussian case. Works such as Ogburn et al. [12], Shpitser [21], Shpitser et al. [22], and Tchetgen et al. [27] use LWF CGs to model interference and compute some causal effect of interest. However, they do not describe how interference is exactly modeled.

The rest of the paper is organized as follows. Section 2 introduces some notation and definitions. Section 3 defines UCGs formally. Sections 4 and 5 define global, local and pairwise Markov properties for UCGs and prove their equivalence. Section 6 proposes an algorithm for maximum likelihood parameter estimation for UCGs, and reports experimental results. Section 7 considers causal inference in UCGs. Section 8 discusses identifiability of LWF and AMP CGs. Section 9 closes the paper with some discussion. The formal proofs of all the results are contained in Appendix A.

2 Preliminaries

In this paper, set operations like union, intersection and difference have the same precedence. When several of them appear in an expression, they are evaluated left to right unless parentheses are used to indicate a different order. Unless otherwise stated, the graphs in this paper are defined over a finite set of nodes V. Each node represents a random variable. We assume that the random variables are jointly normally distributed. The graphs contain at most one edge between any pair of nodes. The edge may be undirected or directed. We consider two types of directed edges: Solid (→) and dashed (⇢).

The parents of a set of nodes X in a graph G is the set Pa(X)={A|ABorABis inGwithBX}. The neighbors of X is the set Ne(X)={A|ABis inGwithBX}. The adjacents of X is the set Ad(X)={A|ABorBAis inGwithBX} where means →, ⇢ or −. The non-descendants of X are Nd(X)={B|neitherABnorABis inGwithAX}. Moreover, the subgraph of G induced by X is denoted by GX. A route between a node V1 and a node Vn in G is a sequence of (not necessarily distinct) nodes V1,,Vn such that ViAd(Vi+1) for all 1i<n. Moreover, Vj,,Vj+k and Vj+k,,Vj with 1jnk are called subroutes of the route. The route is called undirected if ViVi+1 for all 1i<n. If Vn=V1, then the route is called a cycle. A cycle is called semidirected if it is of the form V1V2Vn or V1V2Vn. A route of distinct nodes is called a path. A chain graph (CG) is a graph with (possibly) directed and undirected edges, and without semidirected cycles. A set of nodes of a CG G is connected if there exists an undirected route in G between every pair of nodes in the set. A chain component of G is a maximal connected set. Note that the chain components of G can be sorted topologically, i. e. for every edge AB or AB in G, the component containing A precedes the component containing B. The chain components of G are denoted by Cc(G). CGs without dashed directed edges are known as LWF CGs, whereas CGs without solid directed edges are known as AMP CGs.

We now recall the interpretation of LWF CGs.[2] Given a route ρ in a LWF CG G, a section of ρ is a maximal undirected subroute of ρ. A section C1Cn of ρ is called a collider section if AC1CnB is a subroute of ρ. We say that ρ is Z-open with ZV when (i) all the collider sections in ρ have some node in Z, and (ii) all the nodes that are outside the collider sections in ρ are outside Z. We now recall the interpretation of AMP CGs.[3] Given a route ρ in an AMP CG G, C is a collider node in ρ if ρ has a subroute ACB or ACB. We say that ρ is Z-open with ZV when (i) all the collider nodes in ρ are in Z, and (ii) all the non-collider nodes in ρ are outside Z. Let X, Y and Z denote three disjoint subsets of V. When there is no Z-open route in a LWF or AMP CG G between a node in X and a node in Y, we say that X is separated from Y given Z in G and denote it as XGY|Z. Moreover, we represent by XpY|Z that X and Y are conditionally independent given Z in a probability distribution p. We say that p satisfies the global Markov property with respect to G when XpY|Z for all X,Y,ZV such that XGY|Z. When XpY|Z if and only if XGY|Z, we say that p is faithful to G. Two LWF CGs or AMP CGs are Markov equivalent if the set of distributions that satisfy the global Markov property with respect to each CG is the same. Characterizations of Markov equivalence exist. See Frydenberg [5, Theorem 5.6] for LWF CGs, and Andersson et al. [1, Theorem 5] for AMP CGs. The following theorem gives an additional characterization.

Theorem 1

Two LWF CGs or AMP CGs G and H are Markov equivalent if and only if they represent the same separations.

Finally, let X, Y, W and Z be disjoint subsets of V. A probability distribution p that satisfies the following five properties is called graphoid: Symmetry XpY|ZYpX|Z, decomposition XpYW|ZXpY|Z, weak union XpYW|ZXpY|ZW, contraction XpY|ZWXpW|ZXpYW|Z, and intersection XpY|ZWXpW|ZYXpYW|Z. If p also satisfies the following property, then it is called compositional graphoid: Composition XpY|ZXpW|ZXpYW|Z. It is known that every Gaussian distribution is a compositional graphoid [26, Lemma 2.1, Proposition 2.1 and Corollary 2.4].

3 Unified chain graphs

In this section, we introduce our causal models to represent both interference and non-interference relationships. Let p denote a Gaussian distribution that satisfies the global Markov property with respect to a LWF or AMP CG G. Then,

(1)p(V)=KCc(G)p(K|Pa(K))

because KGK1KnPa(K)|Pa(K) where K1,,Kn denote the chain components of G that precede K in an arbitrary topological ordering of the components. Assume without loss of generality that p has zero mean vector. Moreover, let Σ and Ω denote respectively the covariance and precision matrices of p(K,Pa(K)). Each matrix is then of dimension (|K|+|Pa(K)|)×(|K|+|Pa(K)|). Bishop [2, Section 2.3.1] shows that the conditional distribution p(K|Pa(K)) is Gaussian with covariance matrix ΛK and the mean vector is a linear function of Pa(K) with coefficients βK. Then, βK is of dimension |K|×|Pa(K)|, and ΛK is of dimension |K|×|K|. Moreover, βK and ΛK can be expressed in terms of Σ and Ω as follows:

(2)K|Pa(K)N(βKPa(K),ΛK)

with

(3)βK=ΣK,Pa(K)ΣPa(K),Pa(K)1=ΩK,K1ΩK,Pa(K)

and

(4)ΛK=ΣK|Pa(K)=ΣK,KΣK,Pa(K)ΣPa(K),Pa(K)1ΣPa(K),K=ΩK,K1

where ΣK,Pa(K) denotes the submatrix of Σ with rows K and columns Pa(K), and ΣPa(K),Pa(K)1 denotes the inverse of ΣPa(K),Pa(K). Moreover, (ΛK1)i,j=0 for all i,jK such that ij is not in G, because iGj|K{i,j}Pa(K) [7, Proposition 5.2].

Furthermore, if G is an AMP CG, then (βK)i,j=0 for all iK and jPa(K)Pa(i), because iGPa(K)Pa(i)|Pa(i). Therefore, the directed edges of an AMP CG are suitable for representing non-interference. On the other hand, the directed edges of a LWF CG are suitable for representing interference. To see it, note that if G is a LWF CG, then Ωi,j=0 for all iK and jPa(K)Pa(i), because iGj|K{i}Pa(K){j}. However, (βK)i,j is not identically zero. For instance, if G={1342} and K={3,4}, then

(βK)4,1=(ΩK,K1)4,3(ΩK,Pa(K))3,1(ΩK,K1)4,4(ΩK,Pa(K))4,1=(ΩK,K1)4,3(ΩK,Pa(K))3,1

because (ΩK,Pa(K))4,1=0 as 1→4 is not in G. In other words, if there is a path in a LWF CG G from j to i through nodes in K, then (βK)i,j is not identically zero. Actually, (βK)i,j can be written as a sum of path weights over all such paths. This follows directly from Equation 3 and the result by Jones and West [6, Theorem 1], who show that (ΩK,K1)i,l can be written as a sum of path weights over all the paths in G between l and i through some (but not necessarily all) nodes in K. Specifically,

(5)(ΩK,K1)i,l=ρπi,l(1)|ρ|+1|(ΩK,K)ρ||ΩK,K|n=1|ρ|1(ΩK,K)ρn,ρn+1

where πi,l denotes the set of paths in G between i and l through nodes in K, |ρ| denotes the number of nodes in a path ρ, ρn denotes the n-th node in ρ, and (ΩK,K)ρ is the matrix with the rows and columns corresponding to the nodes in ρ omitted. Moreover, the determinant of a zero-dimensional matrix is taken to be 1. As a consequence, a LWF CG G does not impose zero restrictions on βK, because K is connected by definition of chain component. Previous works on the use of LWF CGs to model interference (e. g., [12], [21], [22], [27]) focus on developing methods for computing some causal effects of interest, and do not give many details on how interference is really being modeled. In the case of Gaussian LWF CGs, Equations 3 and 5 shed some light on this question. For instance, consider extending the mother-child example in Section 1 to a group of friends. The vaccine for individual j has a protective effect on individual j developing the disease, which in its turn has a protective effect on her friends, which in its turn has a protective effect on her friends’ friends, and so on. Moreover, the protective effect also works in the reverse direction, i. e. the vaccine for the latter has a protective effect on individual j. In other words, the protective effect of the vaccine for individual j on individual i is the result of all the paths for the vaccine’s effect to reach individual i through the network of friends. This is exactly what Equations 3 and 5 tell us (assuming that the neighbors of an individual’s disease node in G are her friends’ disease nodes). Appendix B gives an alternative decomposition of the covariance in terms of path coefficients, which may give further insight into Equation 5.

The discussion above suggests a way to model both interference and non-interference by unifying LWF and AMP CGs, i. e. by allowing →, ⇢ and − edges as long as no semidirected cycle exists. We call these new models unified chain graphs (UCGs). The lack of an edge − imposes a zero restriction on the elements of ΩK,K, as in LWF and AMP CGs. The lack of an edge ⇢ in an UCG imposes a zero restriction on the elements of βK, as in AMP CGs. Finally, the lack of an edge → imposes a zero restriction on the elements of ΩK,Pa(K) but not on the elements of βK, as in LWF CGs. Therefore, edges ⇢ may be used to model non-interference, whereas edges → may be used to model interference. For instance, the mother-child example in Section 1 may be modeled with the UCG in Figure 1 (c).

Every UCG is a CG, but the opposite is not true due to the following requirement. We divide the parents of a set of nodes X in an UCG G into mothers and fathers as follows. The fathers of X are Fa(X)={B|BAis inGwithAX}. The mothers of X are Mo(X)={B|BAis inGwithAX}. We require that Fa(K)Mo(K)= for all KCc(G). Therefore, the lack of an edge ⇢ imposes a zero restriction on the elements of βK corresponding to the mothers, and the lack of an edge → imposes a zero restriction on the elements of ΩK,Pa(K) corresponding to the fathers. In other words, G imposes the following zero restrictions:

(6)(βK)i,j=0for alliKandjMo(K)Mo(i),
(7)Ωi,j=0for alliKandjFa(K)Fa(i),and
(8)(ΛK1)i,j=0for alliKandjKNe(i).

The reason why we require that Fa(K)Mo(K)= is to avoid imposing contradictory zero restrictions on βK, e. g. the edge ji excludes the edge ji by definition of CG, which implies that (βK)i,j is identically zero, but ji implies the opposite. In other words, without this constraint, UCGs would reduce to AMP CGs. The following lemma formalizes this statement.

Lemma 2

Without the requirement thatFa(K)Mo(K)=for allKCc(G), the UCG G imposes the same zero restrictions onβKandΛK1as the AMP CG resulting from removing the edgesfrom G.

The lemma above implies that the UCG and the AMP CG can model the same Gaussian distributions. Lastly, note that the zero restrictions associated with an UCG (i. e., Equations 6-8) induce a Gaussian distribution by Equation 1 and Bishop [2, Section 2.3.3].

4 Global Markov property

In this section, we present a separation criterion for UCGs. Given a route ρ in an UCG, C is a collider node in ρ if ρ has a subroute ACB or ACB or ACB. A section of ρ is a maximal undirected subroute of ρ. A section C1Cn of ρ is called a collider section if AC1CnB is a subroute of ρ. We say that ρ is Z-open with ZV when (i) all the collider nodes in ρ are in Z, (ii) all the collider sections in ρ have some node in Z, and (iii) all the nodes that are outside the collider sections in ρ and are not collider nodes in ρ are outside Z. Let X, Y and Z denote three disjoint subsets of V. When there is no Z-open route in G between a node in X and a node in Y, we say that X is separated from Y given Z in G and denote it as XGY|Z. Note that this separation criterion unifies the criteria for LWF and AMP CGs reviewed in Section 3. Finally, we say that a probability distribution p satisfies the global Markov property with respect to an UCG G if XpY|Z for all X,Y,ZV such that XGY|Z.

The next theorem states the equivalence between the global Markov property and the zero restrictions associated with an UCG.

Theorem 3

A Gaussian distribution p satisfies Equations1and68with respect to an UCG G if and only if it satisfies the global Markov property with respect to G.

We have mentioned before that Jones and West [6] prove that the covariance between two nodes i and j can be written as a sum of path weights over the paths between i and j in a certain undirected graph (recall Equation 5 for the details). Wright [29] proves a similar result for DAGs. The following theorem generalizes both results.

Theorem 4

Given a Gaussian distribution that satisfies the global Markov property with respect to an UCG G, the covariance between two nodes i and j of G can be written as a sum of products of weights over the edges of the open paths between i and j in G.

5 Block-recursive, pairwise and local Markov properties

In this section, we present block-recursive, pairwise and local Markov properties for UCGs, and prove their equivalence to the global Markov property for Gaussian distributions. Equivalence means that every Gaussian distribution that satisfies any of these properties with respect to an UCG also satisfies the global Markov property with respect to the UCG, and vice versa. The relevance of these results stems from the fact that checking whether a distribution satisfies the global Markov property can now be performed more efficiently: We do not need to check that every global separation corresponds to a statistical independence in the distribution, it suffices to do so for the local (or pairwise or block-recursive) separations, which are typically considerably fewer. This is the approach taken by most learning algorithms based on hypothesis tests, such as the PC algorithm for DAGs [23], and its posterior extensions to LWF CGs [24] and AMP CGs [14]. The results in this section pave the way for developing a similar learning algorithm for UCGs, something that we postpone to a future article.

We say that a probability distribution p satisfies the block-recursive Markov property with respect to G if for any chain component KCc(G) and vertex iK

  1. ipNd(K)KFa(K)Mo(i)|Fa(K)Mo(i),

  2. ipNd(K)KFa(i)Mo(K)|K{i}Fa(i)Mo(K) and

  3. ipj|K{i,j}Pa(K) for all jK{i}Ne(i).

Theorem 5

A Gaussian distribution p satisfies Equations1and6-8with respect to an UCG G if and only if it satisfies the block-recursive Markov property with respect to G.

We say that a probability distribution p satisfies the pairwise Markov property with respect to an UCG G if for any non-adjacent vertices i and j of G with iK and KCc(G)

  1. ipj|Nd(i)K{j} if jNd(i)KFa(K)Mo(i), and

  2. ipj|Nd(i){i,j} if jNd(i){i}Fa(i)Mo(K).

Theorem 6

The pairwise and block-recursive Markov properties are equivalent for graphoids.

We say that a probability distribution p satisfies the local Markov property with respect to an UCG G if for any vertex i of G with iK and KCc(G)

  1. ipNd(i)KFa(K)Mo(i)|Fa(K)Mo(i) and

  2. ipNd(i){i}Pa(i)Ne(i)Mo(Ne(i))|Pa(i)Ne(i)Mo(Ne(i)).

Theorem 7

The local and pairwise Markov properties are equivalent for Gaussian distributions.

Theorems 3 and 5-7 imply the following corollary, which summarizes our results.

Corollary 8

The global, block-recursive, local and pairwise Markov properties are all equivalent for Gaussian distributions.

6 Maximum likelihood parameter estimation

In this section, we introduce a procedure for computing maximum likelihood estimates (MLEs) of the parameters of an UCG G, i. e. the MLEs of the non-zero entries of (βK)K,Mo(K), ΩK,K and ΩK,Fa(K) for every chain component K of G. Specifically, we adapt the procedure proposed by Drton and Eichler [4] for computing MLEs of the parameters of an AMP CG. The procedure combines generalized least squares and iterative proportional fitting. Suppose that we have access to a data matrix D whose column vectors are the data instances and the rows are indexed by V. Moreover, let DX denote the data over the variables XV, i. e. the rows of D corresponding to X. Note that thanks to Equation 1, the MLEs of the parameters corresponding to each chain component K of G can be obtained separately. Moreover, recall that K|Pa(K)N(βKPa(K),ΩK,K1). Therefore, we may compute the MLEs of the parameters corresponding to the component K by iterating between computing the MLEs ΩˆK,K and ΩˆK,Fa(K) and thus (βˆK)K,Fa(K) while fixing (βˆK)K,Mo(K), and computing the MLEs (βˆK)K,Mo(K) while fixing (βˆK)K,Fa(K). Specifically, we initialize (βˆK)K,Mo(K) to zero and then iterate through the following steps until convergence:

  1. Compute ΩˆK,K and ΩˆK,Fa(K) from the data DFa(K) and the residuals DK(βˆK)K,Mo(K)DMo(K).

  2. Compute (βˆK)K,Fa(K) from ΩˆK,K and ΩˆK,Fa(K).

  3. Compute (βˆK)K,Mo(K) from the data DMo(K) and the residuals DK(βˆK)K,Fa(K)DFa(K).

The first step corresponds to computing the MLEs of the parameters of a LWF CG and, thus, it is solved by running the iterative proportional fitting procedure as indicated in Lauritzen [7]. This procedure optimizes the likelihood function iteratively over different sections of the parameter space. Specifically, each iteration adjusts the covariance matrix for one clique marginal. The second step above is solved by Equation 3. The third step corresponds to computing the MLEs of the parameters of an AMP CG and, thus, it is solved analytically by Equation 13 in Drton and Eichler [4]. Note that to guarantee convergence to the MLEs, the first and third step should be solved jointly. Therefore, our procedure is expected to converge to a local rather than global maximum of the likelihood function. As Drton and Eichler [4] note, this is also the case for their procedure, upon which ours builds. As for convergence, note that our procedure consists in interleaving the iterative proportional fitting procedure in the first step, and the analytical solution to generalized least squares in the third step. Drton and Eichler [4, Proposition 1] prove that each of these steps increases the likelihood function, which implies convergence since the likelihood function is bounded. See also Lauritzen [7, Theorem 5.4]. The experiments in the next section confirm that our procedure converges to satisfactory estimates within a few iterations.

6.1 Experimental evaluation

First, we generate 1000 UCGs as follows. Each UCG consists of five mothers, five fathers and 10 children. The edges are sampled independently and with probability 0.2 each. If the edges sampled do not satisfy the following two constraints, the sampling process is repeated: (i) There must be an edge from every parent to some child, i. e. the parents are real parents, and (ii) the children must be connected by an undirected path, i. e. they conform a chain component.

Then, we parameterize each of the 1000 UCGs generated above as follows. The 10 children are denoted by K, and the 10 parents by Pa(K). The non-zero elements of βK corresponding to the mothers are sampled uniformly from the interval [−3,3]. The non-zero elements of ΩK,K and ΩK,Fa(K) are sampled uniformly from the interval [−3,3], with the exception of the diagonal elements of ΩK,K which are sampled uniformly from the interval [0,30]. If the sampled ΩK,K is not positive definite then the sampling process is repeated. The reason why the diagonal elements are sampled from a wider interval is to avoid having to repeat the sampling process too many times.

Finally, note that each of the 1000 parameterized UCGs generated above specifies one probability distribution p(K|Pa(K)). The goal of our experiments is to evaluate the accuracy of the algorithm presented before to estimate the parameter values corresponding to p(K|Pa(K)) from a finite sample of p(K,Pa(K)). To generate these learning data, we sample first p(Pa(K)) and then p(K|Pa(K)). Each of the 1000 probability distributions p(Pa(K)) is constructed as follows. The off-diagonal elements of ΩPa(K),Pa(K) are sampled uniformly from the interval [−3,3], whereas the diagonal elements are sampled uniformly from the interval [0,30]. As before, we repeat the sampling process if the resulting matrix is not positive definite. Note that no element of ΩPa(K),Pa(K) is identically zero. For the experiments, we consider samples of size 500, 2500 and 5000.

For each of the UCGs and corresponding samples generated above, we run the parameter estimation procedure until the MLEs do not change in two consecutive iterations or 100 iterations are performed. We then compute the relative difference between each true parameter value θ and the MLE θˆ, which we define as abs((θθˆ)/θ). We also compute the residual difference, which we define as the residual with the parameter estimates minus the residual with the true parameter values. The procedure is implemented in R and the code is available at https://www.dropbox.com/s/b9vmqgf99da3qxm/UCGs.R?dl=0.

The results of the experiments are reported in Table 1. The difference between the quartiles Q1 and Q3 is not too big, which suggests that the column Median is a reliable summary of most of the runs. We therefore focus on this column for the rest of the analysis. Note however that some runs are exceptionally good and some others exceptionally bad, as indicated by the columns Min and Max. To avoid a bad run, one may consider a more sophisticated initialization of the parameter estimation procedure, e. g. multiple restarts.

Table 1

Results of the experiments with edge probability 0.2.

Sample sizePerformance criterionMinQ1MedianMeanQ3Max
Number of edges →5.009.0011.0011.1213.0019.00
Number of edges ⇢5.009.0011.0011.1913.0020.00
Number of edges −9.0010.0012.0011.8013.0019.00
500Number of iterations4.008.0010.0017.0417.00101.00
ΩK,K relative diff.0.000.050.130.930.41912.64
ΩK,Fa(K) relative diff.0.000.110.272.340.658219.97
(βK)K,Fa(K) relative diff.0.000.200.5016.401.20506006.30
(βK)K,Mo(K) relative diff.0.000.010.020.120.05131.21
Residual diff.−1743.25−5.07−2.70−10.80−1.5743.04
2500Number of iterations5.008.0010.0016.7516.00101.00
ΩK,K relative diff.0.000.020.060.640.184202.19
ΩK,Fa(K) relative diff.0.000.050.121.020.282662.50
(βK)K,Fa(K) relative diff.0.000.080.212.040.5621299.60
(βK)K,Mo(K) relative diff.0.000.000.010.060.02107.23
Residual diff.−3137.71−4.95−2.61−13.56−1.64104.26
5000Number of iterations4.008.0010.0016.6816.00101.00
ΩK,K relative diff.0.000.020.040.320.131088.67
ΩK,Fa(K) relative diff.0.000.030.080.700.201674.42
(βK)K,Fa(K) relative diff.0.000.050.152.100.4026507.95
(βK)K,Mo(K) relative diff.0.000.000.010.040.0219.88
Residual diff.−1131.10−5.11−2.68−10.13−1.59131.23

The sample size has a clear impact on the accuracy of the MLEs, as indicated by the decrease in relative difference. For instance, half of the MLEs are less than 27 %, 12 % and 8 % away from the true values for the samples sizes 500, 2500 and 5000, respectively. The effect of the sample size on the accuracy of the MLEs can also be appreciated from the fact that the residual difference does not grow with the sample size. The parameters (βK)K,Mo(K) seem easier to estimate than (βK)K,Fa(K), as indicated by the smaller relative difference of the former. This is not surprising since the latter may accumulate the errors in the estimation of ΩK,K and ΩK,Fa(K) (recall Equation 3).

Next, we repeat the experiments with an edge probability of 0.5 instead of 0.2, in order to consider denser UCGs. The results of these experiments are reported in Table 2. They lead to essentially the same conclusions as before. When comparing the two tables, we can see that the MLE accuracy is slightly worse for the denser UCGs. This is expected because the denser UCGs have more parameters to estimate from the same amount of data. However, the residual difference is better for the denser UCGs. This is again expected because the denser UCGs impose fewer constraints. All in all, both tables show that the quartile Q3 of the residual difference is negative, which indicates that the MLEs induce a better fit of the data than the true parameter values. We therefore conclude that the parameter estimation procedure proposed works satisfactorily.

Table 2

Results of the experiments with edge probability 0.5.

Sample sizePerformance criterionMinQ1MedianMeanQ3Max
Number of edges →14.0022.0025.0024.8927.0036.00
Number of edges ⇢13.0023.0025.0025.1528.0036.00
Number of edges −13.0020.0022.0022.2424.0032.00
500Number of iterations7.0011.0016.0026.2128.00101.00
ΩK,K relative diff.0.000.070.191.680.533293.27
ΩK,Fa(K) relative diff.0.000.120.312.870.7318323.37
(βK)K,Fa(K) relative diff.0.000.100.281.800.732701.20
(βK)K,Mo(K) relative diff.0.000.010.020.180.061298.98
Residual diff.−2570.64−11.48−6.30−16.99−4.15577.87
2500Number of iterations6.0011.0015.0025.7528.00101.00
ΩK,K relative diff.0.000.030.090.740.231880.50
ΩK,Fa(K) relative diff.0.000.050.130.980.323092.89
(βK)K,Fa(K) relative diff.0.000.050.120.810.331138.60
(βK)K,Mo(K) relative diff.0.000.000.010.060.0265.53
Residual diff.−1409.83−10.80−6.09−7.53−4.163157.38
5000Number of iterations6.0011.0015.0025.6627.25101.00
ΩK,K relative diff.0.000.020.060.500.171236.54
ΩK,Fa(K) relative diff.0.000.040.100.940.234179.08
(βK)K,Fa(K) relative diff.0.000.030.090.570.24969.71
(βK)K,Mo(K) relative diff.0.000.000.010.070.02644.40
Residual diff.−2423.18−11.15−6.35−2.24−4.225801.54

Finally, we conduct a sanity check aimed to evaluate the behavior of the parameter estimation procedure proposed when the UCG contains spurious or superfluous edges, i. e. edges whose associated parameters are zero and thus may be removed from the UCG. To this end, we repeat the previous experiments with a slight modification. Like before, the elements of (βK)K,Mo(K), ΩK,Fa(K) and ΩK,K associated to the edges in the UCG are sampled uniformly from the interval [−3,3]. However, we now set 25 % of these parameters to zero. We expect the estimates for these zeroed parameters to get closer to zero as the sample size grows. The results of these experiments with an edge probability of 0.5 are reported in Table 3. The results for an edge probability of 0.2 are similar. The first three rows of the table report the number of edges. Each edge has associated one parameter, and 25 % of these parameters have been zeroed in the experiments. In the next rows, the table reports the absolute difference between the zeroed parameters and their estimates, i. e. the absolute values of the estimates. As expected, the larger the sample size is the closer to zero the MLEs of the zeroed parameters become. For instance, 75 % of MLEs of the zeroed parameters take a value smaller than 0.76, 0.33 and 0.24 for the samples sizes 500, 2500 and 5000, respectively. Note that these numbers are much lower for the zeroed (βK)K,Mo(K) parameters (specifically 0.06, 0.03, 0.02) because, recall, our parameter estimation procedure initializes (βˆK)K,Mo(K) to zero. Table 3 also shows the residual difference, which is comparable to that in Table 2, which suggests that the existence of spurious edges does not hinder fitting the data. We therefore conclude that the parameter estimation procedure proposed behaves as it should in this sanity check.

Table 3

Results for the zeroed parameters with edge probability 0.5.

Sample sizePerformance criterionMinQ1MedianMeanQ3Max
Number of edges →14.0022.0025.0024.8927.0036.00
Number of edges ⇢13.0023.0025.0025.1528.0036.00
Number of edges −13.0020.0022.0022.3225.0033.00
500Number of iterations7.0011.0016.0025.8428.00101.00
ΩK,K absolute diff.0.000.150.350.500.705.71
ΩK,Fa(K) absolute diff.0.000.170.390.540.7610.11
(βK)K,Mo(K) absolute diff.0.000.010.030.050.061.21
Residual diff.−3664.56−11.18−6.13−19.06−4.15691.96
2500Number of iterations6.0011.0015.0025.3027.00101.00
ΩK,K absolute diff.0.000.060.150.210.296.36
ΩK,Fa(K) absolute diff.0.000.070.170.240.337.37
(βK)K,Mo(K) absolute diff.0.000.010.010.020.030.53
Residual diff.−1859.05−11.04−6.13−13.03−4.093222.82
5000Number of iterations6.0011.0015.0025.6627.25101.00
ΩK,K absolute diff.0.000.040.110.160.225.62
ΩK,Fa(K) absolute diff.0.000.060.120.170.248.21
(βK)K,Mo(K) absolute diff.0.000.000.010.010.020.56
Residual diff.−1256.74−10.93−6.18−6.18−4.226708.29

7 Causal inference

In this section, we show how to compute the effects of interventions in UCGs. Intervening on a set of variables XV modifies the natural causal mechanism of X, as opposed to (passively) observing X. For simplicity, we only consider interventions that set X to constant values. We represent an intervention that sets X=x as do(X=x). Given a chain component K of an UCG G, we have that

K|Pa(K)N(βKPa(K),ΛK)

as discussed at length in Section 3, or equivalently

K=βKPa(K)+ϵK

where ϵKN(0,ΛK). We interpret the last equation as a structural equation, i. e. it defines the natural causal mechanism of the variables in K. Specifically, the natural causal mechanism of ViK is given by the following structural equation:

Vi=VjPa(K)βi,jVj+ϵi

where ϵi denotes an error variable representing the unmodelled causes of Vi. Then, the intervention do(Vi=a) amounts to replacing the last equation with the structural equation Vi=a, which we dub the interventional causal mechanism of Vi. The resulting system of equations represents the behavior of the phenomenon being modelled under the intervention do(Vi=a), i. e. p(V{Vi}|do(Vi=a)). Moreover, p(V{Vi}|do(Vi=a)) satisfies the global Markov property with respect to GV{Vi}. To see it, note that every {Vi}-open route in G that does not include Vi is a {Vi}-open route in GV{Vi} too. However, every {Vi}-open route in G that includes Vi must contain a subroute of the form AViB or AViB or AViB or AV1ViVnB. Note that all of these subroutes are part of the natural causal mechanism of Vi which has been replaced and, thus, they are inactive, i. e. they are not really {Vi}-open.[4]

The single variable interventions described in the paragraph above can be generalized to sets by simply replacing the corresponding equation for each variable in the set.

Graphically, we can represent the natural and interventional causal mechanisms of the variables in an UCG G by adding a new parent to each variable. We denote the resulting UCG by G. The new parent of a variable Vi is a variable FVi that has the same domain as Vi plus a state labeled idle: FVi=a represents the intervention do(Vi=a), whereas FVi=idle represents no intervention on Vi. In other words, FVi=a represents that the natural causal mechanism is inactive and the interventional causal mechanism is active, whereas FVi=idle represents the opposite. See Pearl [16, Section 3.2.2] for further details. In order to decide whether to augment G with FViVi or FViVi, we have to study the interventional causal mechanism. This is unlike previous works where the value set by the intervention is all that matters. Specifically, if FVi=idle then the interventional causal mechanism is inactive and, thus, it does not matter whether we add FViVi or FViVi. We may even decide not to add FVi at all. On the other hand, if FVi=a then the interventional causal mechanism is active and, thus, the type of arrow added matters: If the interventional causal mechanism is such that the intervention delivered may affect (i. e., interfere with) the rest of the variables in K, then we augment G with FViVi, otherwise we augment it with FViVi. We illustrate this with the mother-child example from Section 1. An intervention that immunizes the mother against the disease (i. e., do(D1=0)) may or may not protect the child (i. e., interfere with D2), depending on how the intervention is delivered: It protects the child when the interventional causal mechanism is the intake of some medication (different from the vaccination V1 but comparable), but it does not protect the child when the interventional causal mechanism is a gene therapy (different from the healthy carrier genotype G1 but comparable). Then, the former case should be modeled as FD1D1, whereas the latter should be modelled as FD1D1.

As mentioned, our goal is to compute or identify a causal effect p(Y|do(X=x)) with the help of a given UCG. That is, we would like to leverage the UCG to transform the causal effect of interest into an expression that contains neither the do operator nor latent variables, so that it can be estimated from observational data. Pearl’s do-calculus consists of three rules whose repeated application together with standard probability manipulations do the job for acyclic directed mixed graphs [16, Section 3.4]. We show next that the calculus carries over into UCGs. Specifically, the calculus consists of the following three rules:

  1. Rule 1 (insertion/deletion of observations):

    p(Y|do(X),ZW)=p(Y|do(X),W)ifY(GVX)Z|W.
  2. Rule 2 (intervention/observation exchange):

    p(Y|do(XZ),W)=p(Y|do(X),ZW)ifY(GVX)FZ|WZ.
  3. Rule 3 (insertion/deletion of interventions):

    p(Y|do(XZ),W)=p(Y|do(X),W)ifY(GVX)FZ|W.

Theorem 9

Rules 1–3 are sound for UCGs.

Corollary 10

Consider an intervention on a set of variables X in an UCG G. If the interventional causal mechanism of each variableViXimplies interference (i. e., it is modelled by augmenting G with the edgeFViVi), then

p(VX|do(X))=KCc(G)p(KX|Pa(K)(KX)).

Note that the corollary above implies that the causal effect of any intervention that involves interference is identifiable in a parametrized UCG. The parameter values may be provided by an expert or estimated from data as shown in Section 6. In the latter case, all the variables in the model are assumed to be measured in the data. When the model contains latent variables, we may still perform causal effect identification via rules 1–3. It is also worth mentioning that the corollary above is generalization of causal effect identification in LWF CGs as proposed by Lauritzen and Richardson [8], which in turn is a generalization of causal effect identification in DAGs [16]. This may come as a surprise because undirected edges in UCGs represent interference relationships whereas, in the work by Lauritzen and Richardson [8], they represent dependencies in the equilibrium distribution of a dynamic system with feedback loops. However, it has been suggested that interference is nothing but dependencies in an equilibrium distribution [11], [12], [21].

8 Identifiability of LWF and AMP CGs

This section proves that identifiability of LWF and AMP CGs is possible when the error variables have equal variance. The error variable ϵA associated with a variable AV represents the unmodelled causes of A. In other words, given a probability distribution p that is faithful to a LWF or AMP CG G, we can identify the Markov equivalence class of G (recall Theorem 1) from p by running, for instance, the learning algorithm developed by Studený [24] for LWF CGs and by Peña [14] for AMP CGs. We prove below that we can actually identify G from p if the error variables have equal variance. We discuss this assumption at the end of the section. Our result generalizes a similar result reported by Peters and Bühlmann [18] for DAGs.

Specifically, let G denote a LWF or AMP CG. Assume that the non-zero entries of βK and ΛK1 in Equation 2 have been selected at random. This implies that p is faithful to G with probability almost 1 by Peña [13, Theorems 1 and 2] and Levitz et al. [10, Theorem 6.1].[5] We therefore assume faithfulness hereinafter. Moreover, we rewrite Equation 2 as

(9)K=βKPa(K)+ϵK

in distribution, where ϵKN(0,ΛK). For any ViV, we have then that

(10)Vi=VjPa(K)βi,jVj+ϵi

in distribution, where K denotes the chain component of G that contains Vi, and ϵi denotes an error variable representing the unmodelled causes of Vi. All such error variables are jointly denoted by ϵ which is distributed according to N(0,Λ), where Λ is a block diagonal matrix whose blocks correspond to the covariance matrices ΛK for all KCc(G). Moreover, we assume that the errors ϵi have equal but unknown variance λ2. Note that if the error variances are unequal and unknown but have the form Λi,i=λi2λ2 for some known ratios λi2, then we can satisfy the equal error variance assumption by rescaling each variable Vi by dividing it with λi, i. e. ViVi/λi. This implies that the linear coefficients and the errors get rescaled as βi,jβi,jλj/λi and ϵiϵi/λi. The following lemma proves that, after the rescaling, the error covariance matrix is still positive definite and keeps all the previous (in)dependencies which implies that, after the rescaling, p is still faithful to G with probability almost 1.

Lemma 11

Consider the rescalingϵiϵi/λifor all i. Then, the error covariance matrix represents the same independences before and after the rescaling. Moreover, the error covariance matrix is positive definite after the rescaling if and only if it was so before the rescaling.

We are now ready to state formally the main result of this section.

Theorem 12

Let p be a Gaussian distribution generated by Equation10with equal error variances. Then, G is identifiable from p.

Note that the theorem above implies that two LWF CGs or AMP CGs that represent the same separations are not Markov equivalent under the constraint of equal error variances, i. e. Theorem 1 does not hold under this constraint. The suitability of the equal error variances assumption should be assessed on a per domain basis. However, it may not be unrealistic to assume that it holds when the variables correspond to a relatively homogeneous set of individuals. For instance, in the case of a contagious disease, the error variable represents the unmodelled causes of an individual developing the disease, e. g. environmental factors. We may assume that these factors are the same for all the individuals, given their homogeneity. Therefore, we may assume equal error variances. We conjecture that the theorem above also holds for UCGs. However, a formal proof of this result requires first a characterization of Markov equivalence for UCGs, something that we postpone to a future article.

9 Discussion

LWF and AMP CGs are typically used to represent independence models. However, they can also be used to represent causal models. For instance, LWF CGs have been shown to be suitable for representing the equilibrium distributions of dynamic systems with feedback loops [8]. AMP CGs have been shown to be suitable for representing causal linear models with additive Gaussian noise [15]. LWF CGs have been extended into segregated graphs, which have been shown to be suitable for representing causal models with interference [21]. In this paper, we have shown how to combine LWF and AMP CGs to represent causal models of domains with both interference and non-interference relationships. Moreover, we have defined global, local and pairwise Markov properties for the new models, which we have coined unified chain graphs (UCGs), and shown that these properties are equivalent for Gaussian distributions. We have also proposed and evaluated an algorithm for computing MLEs of the parameters of an UCG. Finally, we have shown how to perform causal inference in UCGs.

It is worth mentioning that we are not the first to unify LWF and AMP CGs. Lauritzen and Sadeghi [9] recently proposed a new class of graphical models that unify many existing classes, including LWF and AMP CGs. Specifically, they consider acyclic graphs with four types of edges: Directed edges, bidirected edges, and solid and dotted undirected edges. Several edges between any pair of nodes are allowed. If their graphs only contain directed and solid undirected edges, then they coincide with LWF CGs. If they only contain directed and dotted undirected edges, then they coincide with AMP CGs. The authors develop global and pairwise Markov properties for the new models, and prove their equivalence. However, the pairwise Markov property only applies to graphs that have no dotted undirected edges. So, it does not apply to AMP CGs or superclasses of it. Other differences with our work are that no local Markov property is proposed, no parameterization or parameter learning algorithm is proposed, and no causal interpretation is given. However, the main difference with our work is that their models cannot accommodate both interference and non-interference relationships, because they rely on a single type of directed edge. For instance, our mother-child example may be modeled with a graph that contains the edges V1D1, G1D1 and D1D2 or D1D2 or both. However, this graph cannot represent that intervening on V1 must have an effect on D2 while intervening on G1 must not, because the paths from V1 and G1 to D2 contain the same types of edges in the same order, namely a directed edge followed by a solid or dotted undirected edge. This leads us to conclude that the models proposed by Lauritzen and Sadeghi [9] do not subsume UCGs.

Finally, we would like to mention some questions that we have not studied in this paper but which we will. We plan to extend UCGs to categorical random variables. When dealing with continuous random variables, assuming that these are jointly Gaussian simplifies the problem by restricting the relations to be linear. However, in our opinion, the main simplification that the Gaussian assumption brings is that checking whether an independence holds reduces to checking whether a linear coefficient or an entry in a precision matrix is identically zero. Discrete UCGs will not enjoy this advantage. In any case, finding a suitable/amenable parameterization of discrete UCGs is of utmost importance. Moreover, Drton [3] has shown that discrete LWF CGs are smooth models but discrete AMP CGs are not. Non-smoothness implies that some standard asymptotic distribution results (e. g., normal distribution limits for MLEs and χ2-limits for likelihood ratios) may not hold for the model at hand. Therefore, we need to investigate whether non-smoothness hinders discrete UCGs from representing interference and non-interference relationships. Other questions that we plan to study are (i) characterize Markov equivalent UCGs, (ii) develop structure learning algorithms based on the local and pairwise Markov properties, (iii) extend UCGs to model confounding via bidirected edges, and (iv) make use of the linearity of the relations for causal effect identification in UCGs along the lines in Pearl [16, Chapter 5].

Acknowledgment

We thank the Reviewers and the Associate Editor for their comments, which helped us to improve this work substantially.

Appendix A Proofs

Proof of Theorem 1

The result has been proven before for LWF CGs [13, Corollary 1]. We prove it here for AMP CGs. The if part is trivial. To see the only if part, note that there are Gaussian distributions p and q that are faithful respectively to G and H [10, Theorem 6.1]. Moreover, p satisfies the global Markov property with respect to H, because G and H are Markov equivalent. Likewise for q and G. Therefore, G and H must represent the same separations.  □

Proof of Lemma 2

As discussed before, missing undirected edges impose zero restrictions on the elements of ΛK1 due to Equation 8. Likewise, missing ⇢ edges impose zero restrictions on the elements of βK corresponding to the mothers due to Equation 6. Likewise, missing → edges impose zero restrictions on the elements of ΩK,Pa(K) corresponding to the fathers due to Equation 7, but not on the elements of βK. Therefore, if we ignore the requirement and do not make any distinction between mothers and fathers, then the missing edges from Pa(K) to K impose zero restrictions on βK and, thus, we can simply ignore the edges → since this will not imply any additional zero restriction.  □

Given an UCG G, let GU denote the subgraph of G resulting from the following process: Start with the empty graph over U and add to it the edge AB if and only if G has a route of the form ABC where (i) CU, and (ii) the route has no subroute RST. The next lemma shows that if XGY|Z, then X and Y can be extended to cover the whole V(GXYZ)Z, where V(GXYZ) denotes the nodes in GXYZ.

Lemma 13

Given an UCG G and three disjoint subsets X, Y and Z of V such thatXGY|Z, thenXGXYZY|ZwithY={BV(GXYZ)(XZ):XGXYZB|Z}andX=V(GXYZ)(YZ).

Proof

We show that AGXYZB|Z for all AX and BY. Note that this holds if AX by definition of Y. Now, consider AXX. Assume to the contrary that there exists a Z-open route π between A and BY in GXYZ. Moreover, there is a Z-open route σ between A and some AX in GXYZ, because AY. Let ρ=σπ, i. e. the route resulting from concatenating σ and π. Note that if A is neither a collider node in ρ nor in a collider section of ρ, then ρ is Z-open because AZ. This contradicts that BY and, thus, A must be (i) in a collider section of ρ, or (ii) a collider node in ρ. Note that in case (i), the collider section is of the form V1V2AVn1Vn where V2,,Vn1Z because, otherwise, π or σ are not Z-open. This together with the fact that AZ imply that ρ is not Z-open in either case (i) or (ii). However, it can be modified into a Z-open route between A and B as follows, which contradicts that BY.

  1. Assume that case (i) holds. Recall that AV(GXYZ) and consider the following three cases. First, assume that AV(GZ). Then, GXYZ has a route ϱ from A to some CZ that only contains edges →, ⇢ and − and no subroute RST. Assume without loss of generality that C is the only node in ϱ that belongs to Z. Let ϱ denote the route resulting from traversing ϱ from C to A. Then, the route σϱϱπ is Z-open, which contradicts that BY. Second, assume that AV(GX) but AV(GZ). Then, GXYZ has a route ϱ from some AX to A that only contains edges ←, ⇠ and − and no subroute RST. Note that no node in ϱ is in Z, because AV(GZ). Then, the route ϱπ is Z-open, which contradicts that BY. Third, assume that AV(GY) but AV(GZ). Then, we can reach a contradiction much in the same way as in the previous case.

  2. Assume that case (ii) holds. We can prove this case much in the same way as case (i). Simply note that we can now choose the route ϱ in case (i) so that it does not start with an edge −. To see this, note that that A is a collider node in ρ implies that an edge AA is in ρ and, thus, in GXYZ. However, since AXYZ, this is possible only if GXYZ has a route AACD or AACD where (i) DXYZ, and (ii) the route has no subroute RST.

 □

A pure collider route is a route whose all intermediate nodes are collider nodes or in a collider section of the route, i. e. ACB, ACB, ACB, AC1C2B, AC1CnB, or AB.

Lemma 14

Given an UCG G and three disjoint subsets X, Y and Z of V such thatXGY|Z, then there is no pure collider route inGXYZbetween someAXandBY.

Proof

Note that XGXYZY|Z with V(GXYZ)=XYZ by Lemma 13. Assume to the contrary that there is a pure collider route ρ in GXYZ between AX and BY. If ρ is of the form AB, then it contradicts that XGXYZY|Z. Likewise, if ρ is of the form ACB, ACB or ACB, then it contradicts that XGXYZY|Z regardless of whether CX, CY or CZ. Likewise, if ρ is of the form AC1C2B, then C1X or C1Z to avoid contradicting that XGXYZY|Z. For the same reason, C2Y or C2Z. However, any of the four combinations contradicts that XGXYZY|Z. Finally, if ρ is of the form AC1CnB, then C1,,CnZ to avoid contradicting that XGXYZY|Z. However, this implies that some node in X is adjacent to some node in Y, which contradicts that XGXYZY|Z.  □

Lemma 15

LetUN(0,ΛU)andLN(βLU,ΛL)whereβLis of dimension|L|×|U|and, thus, W=(U,L)TN(0,Σ). If we now set(βL)i,j=0, thenWN(0,Σ)such thatΣm,n=Σm,nfor allm,ni.

Proof

Bishop [2, Section 2.3.3] proves that

Σ=ΛUΛUβLTβLΛUΛL+βLΛUβLT.

Now, note that

(βLΛU)m,n=r(βL)m,r(ΛU)r,n

and

(ΛL+βLΛUβLT)m,n=(ΛL)m,n+rs(βL)m,r(ΛU)r,s(βL)n,s

which imply the result.  □

The following observation follows from the lemma above and will be used later. Let p(W) and p(W) denote the distribution of W before and after setting (βL)i,j=0 in the lemma, i. e. N(0,Σ) and N(0,Σ). Then, the lemma implies that p(W{i})=p(W{i}).

Lemma 16

LetK1,,Kndenote the topologically sorted chain components of an UCG G. Then, (K1,,Kn)TN(0,Σ)where(Σ1)i,j=0if there are no pure collider routes in G between i and j.

Proof

We prove the result by induction on the number of chain components n. The result clearly holds for n=1, because the only possible pure collider route in G between i and j is of the form ij, and Equation 8 implies that (Σ1)i,j=0 if ij is not in G. Assume as induction hypothesis that the result holds for all n<r. We now prove it for n=r. Let U=K1Kr1 and L=Kr. By the induction hypothesis, UN(0,ΛU). Moreover, let LN(βLPa(L),ΛL). Bishop [2, Section 2.3.3] shows that (U,L)TN(0,Σ) where

(11)Σ1=ΛU1+βLTΛL1βLβLTΛL1ΛL1βLΛL1.

Assume that i,jU. Then, Equation 11 implies that (Σ1)i,j=0 if (ΛU1)i,j=0 and (βLTΛL1βL)i,j=0. We show below that the first (respectively second) condition holds if there are no pure collider routes in G between i and j through nodes in U (respectively L).

  1. By the induction hypothesis, (ΛU1)i,j=0 if there are no pure collider routes in G between i and j through nodes in U.

  2. Note that (βLTΛL1βL)i,j=rs(βL)r,i(ΛL1)r,s(βL)s,j=0 if (βL)r,i(ΛL1)r,s(βL)s,j=0 for all r and s. For r=s, (βL)r,i(ΛL1)r,s(βL)s,j=0 if (βL)r,i=0 or (βL)s,j=0. For rs, (βL)r,i(ΛL1)r,s(βL)s,j=0 if (βL)r,i=0 or (ΛL1)r,s=0 or (βL)s,j=0. Moreover, as shown in Section 3, (βL)r,i=0 if neither ir nor icr is in G. Likewise, (βL)s,j=0 if neither js nor jds is in G. Finally, (ΛL1)r,s=0 if rs is not in G. These conditions rule out the existence of pure collider routes in G between i and j through nodes in L.

Now, assume that iU and jL. Then, Equation 11 implies that (Σ1)i,j=0 if (βLTΛL1)i,j=0 if r(βL)r,i(ΛL1)r,j=0 if (βL)r,i(ΛL1)r,j=0 for all r. For j=r, (βL)r,i(ΛL1)r,j=0 if (βL)r,i=0 if neither ir nor icr is in G. For jr, (βL)r,i(ΛL1)r,j=0 if (βL)r,i=0 or (ΛL1)r,j=0 if neither ir nor icr is in G, or rj is not in G. Either case holds if there are no pure collider routes in G between i and j.

Finally, assume that i,jL. Then, Equation 11 implies that (Σ1)i,j=0 if (ΛL1)i,j=0 if ij is not in G, i. e. if there are no pure collider routes in G between i and j.  □

Proof of Theorem 3

We prove first the if part. For any chain component KCc(G), clearly KGNd(K)KPa(K)|Pa(K) and thus KpNd(K)KPa(K)|Pa(K), which implies Equation 1. For any vertex iK, clearly iGMo(K)Mo(i)|Fa(K)Mo(i) and thus ipMo(K)Mo(i)|Fa(K)Mo(i), which implies Equation 6. For any vertices iK and jFa(K)Fa(i), clearly iGj|K{i}Pa(K){j} and thus iGj|K{i}Pa(K){j}, which implies Equation 7. Finally, for any non-adjacent vertices i,jK, clearly iGj|K{i,j}Pa(K) and thus ipj|K{i,j}Pa(K), which implies Equation 8.

We now prove the only if part. Consider three disjoint subsets X, Y and Z of V such that XGY|Z. Let K1,,Kn denote the topologically sorted chain components of GXYZ. Note that GK1Kn and GXYZ only differ in that the latter may not have all the edges ⇢ in the former. Note also that K1,,Kn are chain components of G, too. Consider a topological ordering of the chain components of G, and let Q1,,Qm denote the components that precede Kn in the ordering, besides K1,,Kn1. Note that the edges in G from any Qi to any Kj must be of the type ⇢ because, otherwise, Qi would be a component of GXYZ. Therefore, GQ1QmK1Kn and GQ1QmGXYZ only differ in that the latter may not have all the edges ⇢ in the former. In other words, the latter may impose additional zero restrictions on the elements of some βKj corresponding to the mothers. Consider adding such additional restrictions to the marginal distribution p(Q1,,Qm,K1,,Kn) obtained from p via Equations 1 and 2, i. e. consider setting the corresponding elements of βKj to zero (recall that βKj are such that the mean vector of p(Kj|Pa(Kj)) is a linear function of Pa(Kj) with coefficients βKj). Call the resulting distribution p(Q1,,Qm,K1,,Kn).

Finally, recall again that GQ1QmK1Kn and GQ1QmGXYZ only differ in that the latter may not have all the edges ⇢ in the former. Note also that every node in XYZ has the same mothers in GQ1QmK1Kn and GQ1QmGXYZ. Then, p(XYZ)=p(XYZ) by Lemma 15 and, thus, XpY|Z if and only if XpY|Z. Now, recall from Lemma 14 that XGY|Z implies that there is no pure collider route in GXYZ between any vertices AX and BY. Then, there is no such a route either in GQ1QmGXYZ, because this UCG has no edges between the nodes in V(GQ1Qm) and the nodes in V(GXYZ). This implies that ApB|X{A}Y{B}ZQ1Qm by Lemma 16 and Lauritzen [7, Proposition 5.2], which implies XpY|ZQ1Qm by repeated application of intersection. Moreover, XpQ1Qm|Z because GQ1QmGXYZ has no edges between the nodes in V(GQ1Qm) and the nodes in V(GXYZ). This implies XpY|Z by contraction and decomposition which, as shown, implies XpY|Z.  □

Proof of Theorem 4

We prove the result by induction on the number of chain components n. The result holds for n=1 by Equation 5 [6, Theorem 1]. Assume as induction hypothesis that the result holds for all n<m. We now prove it for n=m. Let U=K1Km1 and L=Km. Let UN(0,ΛU) and LN(βLU,ΛL) and, thus, (U,L)TN(0,Σ). Bishop [2, Section 2.3.3] proves that

Σ=ΛUΛUβLTβLΛUΛL+βLΛUβLT.

Now, note that

(βLΛU)i,j=k(βL)i,k(ΛU)k,j.

By the induction hypothesis, (ΛU)k,j can be written as a sum of products of weights over the edges of the open paths between k and j in G. Moreover, as discussed in Section 3 in relation to Equation 5, (βL)i,k can be written as a sum of products of weights over the edges of the paths from k to i through nodes in L. Since the latter paths start all with a directed edge out of k, the previous observations together imply the desired result.

Finally,

(ΛL+βLΛUβLT)i,j=(ΛL)i,j+kl(βL)i,k(ΛU)k,l(βL)j,l.

As discussed in Section 3, (ΛL)i,j can be written as a sum of products of weights over the edges of the paths between i and j through nodes in L. By the induction hypothesis, (ΛU)k,l can be written as a sum of products of weights over the edges of the open paths between k and l in G. Again, as discussed in Section 3, (βL)i,k (respectively, (βL)j,l) can be written as a sum of products of weights over the edges of the paths from k to i (respectively, from l to j) through nodes in L. Since the latter paths start all with a directed edge out of k (respectively, out of l), the previous observations together imply the desired result.  □

Proof of Theorem 5

We prove first the if part. The property B1 together with weak union and composition imply KpNd(K)KPa(K)|Pa(K) and thus Equation 1. Moreover, B1 implies ipMo(K)Mo(i)|Fa(K)Mo(i) by decomposition and, thus, Equation 6. Moreover, B2 implies ipFa(K)Fa(i)|K{i}Fa(i)Mo(K) by decomposition, which implies ipj|K{i}Pa(K){j} for all jFa(K)Fa(i) by weak union, which implies Equation 7. Finally, B3 implies Equation 8.

We now prove the only if part. Note that if p satisfies Equations 1 and 6-8 with respect to G, then it satisfies the global Markov property with respect to G by Theorem 3. Now, it is easy to verify that the block-recursive property holds. Specifically, the separations iGNd(K)KFa(K)Mo(i)|Fa(K)Mo(i) for all iK with KCc(G) imply B1 by the global Markov property. Likewise, the separations iGNd(K)KFa(i)Mo(K)|K{i}Fa(i)Mo(K) for all iK with KCc(G) imply B2 by the global Markov property. Finally, the separations iGj|K{i,j}Pa(K) for all i,jK with KCc(G) and such that ij is not in G imply B3 by the global Markov property.  □

Proof of Theorem 6

Note that Nd(i)=Nd(K). Then, the properties P1 and P2 imply respectively B1 and B2 by repeated application of intersection. Similarly, P2 implies ipNd(K){i}Pa(K)Ne(i)|Pa(K)Ne(i) by repeated application of intersection, which implies ipK{i}Ne(i)|Pa(K)Ne(i) by decomposition, which implies B3 by weak union. Finally, the property B1 implies P1 by weak union. Likewise, B2 implies P2 by weak union if jK. Assume now that jK and note that B2 implies ipNd(K)KPa(K)|K{i}Pa(K) by weak union. Note also that B3 implies that p(K|Pa(K)) satisfies the global Markov property with respect to GK by Lauritzen [7, Theorem 3.7] and, thus, ipK{i}Ne(i)|Pa(K)Ne(i). Then, ipNd(K){i}Pa(K)Ne(i)|Pa(K)Ne(i) by contraction, which implies P2 by weak union.  □

Proof of Theorem 7

The properties L1 and L2 imply respectively P1 and P2 by weak union. Note also that the pairwise Markov property implies the global property by Theorems 3, 5 and 6. Now, it is easy to verify that the local property holds. Specifically, the separations iGNd(i)KFa(K)Mo(i)|Fa(K)Mo(i) for all iK with KCc(G) imply L1 by the global Markov property. Likewise, the separations iGNd(i){i}Pa(i)Ne(i)Mo(Ne(i))|Pa(i)Ne(i)Mo(Ne(i)) for all iK with KCc(G) imply L2 by the global Markov property.  □

Proof of Theorem 9

Recall from the main text that p(VX|do(X)) satisfies the global Markov property with respect to GVX. Then, it also satisfies the global Markov property with respect to (GVX), because this UCG is a supergraph of GVX. Then, rule 1 holds. Actually, GVX and (GVX) represent the same separations over VX, because all the variables FVi are observed (i. e., FVi=a or FVi=idle) and, thus, they do not open new routes.

For the proof of rule 2, assume that Z and Y are singletons. The generalization to sets of variables is trivial. First, assume that FZZ is in (GVX). The antecedent of rule 2 implies that all the W-open routes in (GVX) from Z to Y start with an edge ZA or ZA. Then, observing Z=z and setting Z=z produces the same effect on Y and, thus, rule 2 holds. Second, assume that FZZ is in (GVX). The antecedent of rule 2 implies that all the W-open routes in (GVX) from Z to Y start with an edge ZB or ZB or with a subroute ZY or ZAB or ZAB or ZAB. Then, observing Z=z and setting Z=z produces the same effect on Y and, thus, rule 2 holds. The result may not be immediate when the route from Z to Y starts with a subroute ZAB. The result follows from the fact that AW and the interventional causal mechanism is FZZ, i. e. the intervention interferes with A.

For the proof of rule 3 holds, assume that Z and Y are singletons. The generalization to sets of variables is trivial. First, assume that FZZ is in (GVX). The antecedent of rule 3 implies that all the W-open routes in (GVX) from Z to Y start with an edge ZA or ZA or with a subroute ZB1BnA such that BiW for all i. Then, setting Z=z has no effect on Y and, thus, rule 3 holds. Second, assume that FZZ is in (GVX). The antecedent of rule 3 implies that all the W-open routes in (GVX) from Z to Y start with an edge ZA or ZA or ZA. Then, setting Z=z has no effect on Y and, thus, rule 3 holds. The result may not be immediate when the route from Z to Y starts with an edge ZA. The result follows from the fact that the interventional causal mechanism is FZZ, i. e. the intervention does not interfere with A.  □

Proof of Corollary 10

Let K1,,Kn denote the topologically sorted chain components of G. Note that KiXGVXK1Ki1Pa(Ki)X|Pa(Ki)X for all i. Moreover, recall from the main text that p(VX|do(X)) satisfies the global Markov property with respect to GVX. Then, we have that

p(VX|do(X))=KCc(G)p(KX|Pa(K)X,do(X)).

Note also that KX(GV[XPa(K)])FPa(K)X|Pa(K). Then, rule 2 implies that

p(VX|do(X))=KCc(G)p(KX|Pa(K),do(XPa(K))).

Note also that KX(GV[XPa(K)K])FKX|Pa(K)(KX) by the assumption that the interventional causal mechanism of each variable AKX is modeled as FAA, and the fact that FB is observed (i. e., FB=idle) for each variable BKX. Then, rule 2 implies that

p(VX|do(X))=KCc(G)p(KX|Pa(K)(KX),do(XPa(K)K)).

Finally, note that KXGFXPa(K)K|Pa(K)(KX). Then, rule 3 implies that

p(VX|do(X))=KCc(G)p(KX|Pa(K)(KX)).

 □

Proof of Lemma 11

Let Λ and Λ denote the error covariance matrices before and after the rescaling, respectively. Note that we only need to consider independences between singletons. In particular, ϵi is independent of ϵj conditioned on ϵL if and only if (ΛijL,ijL1)i,j=0 [7, Proposition 5.2] and, thus, if and only if the (j,i) minor of ΛijL,ijL is zero, i. e. det(M)=0 where M is the result of removing the row j and column i from ΛijL,ijL. Note that det(M)=πSsign(π)kMk,π(k) where S denotes all the permutations over the number of rows or columns of M. Then, det(M)=det(M)k1/λk2 and, thus, det(M)=0 if and only det(M)=0.

A matrix is positive definite if and only if the determinants of all its upper-left submatrices are positive. Therefore, it follows from the previous paragraph that Λ and Λ are both positive definite or none.  □

Proof of Theorem 12

Under the faithfulness assumption, we can identify the Markov equivalence class of G from p by, for instance, running the learning algorithm developed by Studený [24] for LWF CGs and by Peña [14] for AMP CGs. Assume to the contrary that there is a second CG G in the equivalence class that generates p via Equation 10. Let N and N denote the nodes without parents in G and G. Assume that NN. Then, there exists some node LV that is in N but not in N, or vice versa. Assume the latter without loss of generality. Then, there is an edge YL that is in G but not in G. Let Q denote all the parents of L in G except Y. Then, we have from G that

L=βQQ+βYY+ϵL

by Equation 10. Note that βY0 by the faithfulness assumption. Define L=L|Q=q and Y=Y|Q=q in distribution. Since ϵLpQ{Y} by construction, we have from G that

L=βQq+βYY+ϵL

in distribution [19, Lemma 2] and, thus, var(L)>var(ϵL)=λ2. However, we have from G that var(L)λ2 [18, Lemma A1]. This is a contradiction and, thus, N=N. Note that the undirected edges between the nodes in N must be the same in G and G, because Markov equivalent CGs have the same adjacencies as shown by Frydenberg [5, Theorem 5.6] for LWF CGs and by Andersson et al. [1, Theorem 5] for AMP CGs. For the same reason, the directed edges from N to VN must be the same in G and G. Since GVN and GVN generate p(VN|N=n) via Equation 10, we can repeat the reasoning above replacing G, G and p by GVN, GVN and p(VN|N=n). This allows us to conclude that the nodes with no parents in GVN and their corresponding edges coincide with those in GVN. By continuing with this process, we can conclude that G=G.  □

Appendix B Covariance decomposition

It is known from path analysis in Gaussian acyclic directed mixed graphs that the covariance Σi,l can be expressed as the sum for every open path between l and i of the product of path coefficients and error covariances for the edges in the path [16], [29]. The path coefficient αB,A corresponding to an edge AB represents the change in B induced by raising A one unit while keeping all the other variables constant. Note then that the path coefficients carry causal information. For directed and acyclic graphs, αB,A is always identifiable and coincides with the partial regression coefficient βB,A|Z(B,A), where Z(B,A) is a set of nodes that blocks all paths from A to B except AB. For directed and acyclic graphs, we have then that

Σi,l=ρπi,lΣρ1,ρ1n=2|ρ|αρn,ρn1=ρπi,lΣρ1,ρ1n=2|ρ|βρn,ρn1|Z(ρn,ρn1).

Despite being symmetric, undirected edges represent interference by contagion and, thus, they have some features of causal relationships. Then, one may wonder whether it is possible to derive an expression similar to the one above for undirected graphs. Jones and West [6, p. 782] hint this possibility. The next lemma proves it formally: Σi,l can indeed be expressed as the sum for every open path between l and i of the product of regression coefficients, error covariances and inflation factors for the edges in the path.

Lemma 17

Consider a Gaussian distribution over a set of random variables K. Let Σ denote the covariance matrix of the distribution. Assume that the distribution satisfies the global Markov property with respect to an undirected graph G. Then,

Σi,l=ρπi,lΣρ1,ρ1n=2|ρ|βρn,ρn1|K{ρn,ρn1}Σρn,ρn|ρ1,,ρn1Σρn,ρn|K{ρn}.

Proof

Let ρ=Kρ for any ρK. Then,

Σi,l=ρπi,l(1)|ρ|+1|Ωρ,ρ||Ω|n=2|ρ|Ωρn1,ρn=ρπi,l(1)|ρ|+1|Ωρ,ρ||Ω|n=2|ρ|Ωρn1,ρnΩρn,ρnΩρn,ρn=ρπi,l|Ωρ,ρ||Ω|n=2|ρ|βρn,ρn1|K{ρn,ρn1}Ωρn,ρn

where the first equality is due to Jones and West [6, Theorem 1], and the third is due to Lauritzen [7, p. 130]. Then,

Σi,l=ρπi,l|Ωρ,ρ||Ωρ,ρ||Ωρ,ρ|ρ|n=2|ρ|βρn,ρn1|K{ρn,ρn1}Ωρn,ρn=ρπi,l|Ωρ,ρ|ρ1|n=2|ρ|βρn,ρn1|K{ρn,ρn1}Ωρn,ρn=ρπi,l|Σρ,ρ|n=2|ρ|βρn,ρn1|K{ρn,ρn1}Ωρn,ρn

by Schur’s complement and inversion of a partitioned matrix. Then,

Σi,l=ρπi,l|Σρ,ρ|n=2|ρ|βρn,ρn1|K{ρn,ρn1}Σρn,ρn|K{ρn}=ρπi,lΣρ1,ρ1n=2|ρ|βρn,ρn1|K{ρn,ρn1}Σρn,ρn|ρ1,,ρn1Σρn,ρn|K{ρn}

by Lauritzen [7, pp. 129–130].  □

The variance ratio in the lemma is an inflation factor (≥1) that accounts for the overreduction of the variance of ρn when conditioning on the rest of the variables in K. In other words, conditioning on the rest of the variables not only blocks all the rest of the paths from ρn1 to ρn but also overreduces the variance of ρn, which bias the causal effect of ρn1 on ρn represented by βρn,ρn1|K{ρn,ρn1}. The inflation factor remedies this. See Pearl [17, Section 3] for some related phenomena (e. g., selection bias) in acyclic directed mixed graphs.

References

1 Andersson SA, Madigan D, Perlman MD. Alternative Markov Properties for Chain Graphs. Scand J Stat. 2001;28:33–85.10.1111/1467-9469.00224Search in Google Scholar

2 Bishop CM. Pattern Recognition and Machine Learning. Berlin: Springer; 2006.Search in Google Scholar

3 Drton M. Discrete Chain Graph Models. Bernoulli. 2009;15(3):736–53.10.3150/08-BEJ172Search in Google Scholar

4 Drton M, Eichler M. Maximum Likelihood Estimation in Gaussian Chain Graph Models under the Alternative Markov Property. Scand J Stat. 2006;33:247–57.10.1111/j.1467-9469.2006.00482.xSearch in Google Scholar

5 Frydenberg M. The Chain Graph Markov Property. Scand J Stat. 1990;17:333–53.Search in Google Scholar

6 Jones B, West M. Covariance Decomposition in Undirected Gaussian Graphical Models. Biometrika. 2005;92:779–86.10.1093/biomet/92.4.779Search in Google Scholar

7 Lauritzen SL. Graphical Models. London: Oxford University Press; 1996.Search in Google Scholar

8 Lauritzen SL, Richardson TS. Chain Graph Models and Their Causal Interpretations. J R Stat Soc B. 2002;64:321–48.10.1111/1467-9868.00340Search in Google Scholar

9 Lauritzen SL, Sadeghi K. Unifying Markov Properties for Graphical Models. Ann Stat. 2018;46:2251–78.10.1214/17-AOS1618Search in Google Scholar

10 Levitz M, Perlman MD, Madigan D. Separation and Completeness Properties for AMP Chain Graph Markov Models. Ann Stat. 2001;29:1751–84.10.1214/aos/1015345961Search in Google Scholar

11 Ogburn EL, VanderWeele TJ. Causal Diagrams for Interference. Stat Sci. 2014;29:559–78.10.1214/14-STS501Search in Google Scholar

12 Ogburn EL, Shpitser I, Lee Y. Causal Inference, Social Networks, and Chain Graphs. 2018. arXiv:1812.04990 [stat.ME].Search in Google Scholar

13 Peña JM. Faithfulness in Chain Graphs: The Gaussian Case. In: Proceedings of the 14th International Conference on Artificial Intelligence and Statistics. 2011. p. 588–99.Search in Google Scholar

14 Peña JM. Learning AMP Chain Graphs under Faithfulness. In: Proceedings of the 6th European Workshop on Probabilistic Graphical Models. 2012. p. 251–8.Search in Google Scholar

15 Peña JM. Alternative Markov and Causal Properties for Acyclic Directed Mixed Graphs. In: Proceedings of the 32nd Conference on Uncertainty in Artificial Intelligence. 2016. p. 577–86.Search in Google Scholar

16 Pearl J. Causality: Models, Reasoning, and Inference. Cambridge: Cambridge University Press; 2009.10.1017/CBO9780511803161Search in Google Scholar

17 Pearl J. Linear Models: A Useful “Microscope” for Causal Analysis. J. Causal Inference. 2013;1:155–70.10.21236/ADA579021Search in Google Scholar

18 Peters J, Bühlmann P. Identifiability of Gaussian Structural Equation Models with Equal Error Variances. Biometrika. 2014;101:219–28.10.1093/biomet/ast043Search in Google Scholar

19 Peters J, Mooij JM, Janzing D, Schölkopf B. Identifiability of Causal Graphs using Functional Models. In: Proceedings of the 27th Conference on Uncertainty in Artificial Intelligence. 2011. p. 589–98.Search in Google Scholar

20 Peters J, Janzing D, Schölkopf B. Elements of Causal Inference: Foundations and Learning Algorithms. Cambridge: The MIT Press; 2017.Search in Google Scholar

21 Shpitser I. Segregated Graphs and Marginals of Chain Graph Models. In: Advances in Neural Information Processing Systems. 2015. p. 1720–8.Search in Google Scholar

22 Shpitser I, Tchetgen EJT, Andrews R. Modeling Interference Via Symmetric Treatment Decomposition. 2017. arXiv:1709.01050 [stat.ME].Search in Google Scholar

23 Spirtes P, Glymour C, Scheines R. Causation, Prediction and Search. Cambridge: MIT Press; 2000.10.7551/mitpress/1754.001.0001Search in Google Scholar

24 Studený M. On Recovery Algorithms for Chain Graphs. Int J Approx Reason. 1997;17:265–93.10.1016/S0888-613X(97)00018-2Search in Google Scholar

25 Studený M. Bayesian Networks from the Point of View of Chain Graphs. In: Proceedings of the 14th Conference on Uncertainty in Artificial Intelligence. 1998. p. 496–503.Search in Google Scholar

26 Studený M. Probabilistic Conditional Independence Structures. Berlin: Springer; 2005.Search in Google Scholar

27 Tchetgen EJT, Fulcher I, Shpitser I. Auto-G-Computation of Causal Effects on a Network. 2017. arXiv:1709.01577 [stat.ME].Search in Google Scholar

28 VanderWeele TJ, Tchetgen EJT, Halloran ME. Components of the Indirect Effect in Vaccine Trials: Identification of Contagion and Infectiousness Effects. Epidemiology. 2012;23:751–61.10.1097/EDE.0b013e31825fb7a0Search in Google Scholar PubMed PubMed Central

29 Wright S. Correlation and Causation. J Agric Res. 1921;20:557–85.Search in Google Scholar

Received: 2018-11-23
Revised: 2019-07-05
Accepted: 2019-09-15
Published Online: 2019-11-05

© 2020 Jose M. Peña, published by De Gruyter

This work is licensed under the Creative Commons Attribution 4.0 International License.

Downloaded on 28.5.2023 from https://www.degruyter.com/document/doi/10.1515/jci-2018-0034/html
Scroll to top button