Determinantal Generalizations of Instrumental Variables

Luca Weihs 1 , Bill Robinson 2 , Emilie Dufresne 3 , Jennifer Kenkel 4 , Kaie Kubjas Reginald McGee II 5 , McGee II Reginald 6 , Nhan Nguyen 7 , Elina Robeva 8  and Mathias Drton 9
  • 1 Statistics, University of Washington, Box 354322, Seattle, USA
  • 2 Mathematics, Denison University, Granville, USA
  • 3 University of Nottingham, Nottingham, UK
  • 4 Mathematics, University of Utah, Salt Lake City, USA
  • 5 Mathematics and Systems Analysis, Aalto University, Espoo, Finland
  • 6 Mathematical Biosciences Institute, Columbus, USA
  • 7 Department of Mathematics, University of Montana, Missoula, USA
  • 8 Department of Mathematics, Massachusetts Institute of Technology, Cambridge, USA
  • 9 Statistics, University of Washington, Seattle, USA
Luca Weihs, Bill Robinson, Emilie Dufresne, Jennifer Kenkel, Kaie Kubjas Reginald McGee II, McGee II Reginald, Nhan Nguyen, Elina Robeva and Mathias Drton

Abstract

Linear structural equation models relate the components of a random vector using linear interdependencies and Gaussian noise. Each such model can be naturally associated with a mixed graph whose vertices correspond to the components of the random vector. The graph contains directed edges that represent the linear relationships between components, and bidirected edges that encode unobserved confounding. We study the problem of generic identifiability, that is, whether a generic choice of linear and confounding effects can be uniquely recovered from the joint covariance matrix of the observed random vector. An existing combinatorial criterion for establishing generic identifiability is the half-trek criterion (HTC), which uses the existence of trek systems in the mixed graph to iteratively discover generically invertible linear equation systems in polynomial time. By focusing on edges one at a time, we establish new sufficient and new necessary conditions for generic identifiability of edge effects extending those of the HTC. In particular, we show how edge coefficients can be recovered as quotients of subdeterminants of the covariance matrix, which constitutes a determinantal generalization of formulas obtained when using instrumental variables for identification. While our results do not completely close the gap between existing sufficient and necessary conditions we find, empirically, that our results allow us to prove the generic identifiability of many more mixed graphs than the prior state-of-the-art.

Introduction

In a linear structural equation model (L-SEM) the joint distribution of a random vector X=(X1,,Xn)T obeys noisy linear interdependencies. These interdependencies can be expressed with a matrix equation of the form

X=λ0+ΛTX+ϵ,

where Λ=(λvw)Rn×n and λ0=(λ01,,λ0n)TRn are unknown parameters, and ϵ=(ϵ1,,ϵn)T is a random vector of error terms with positive definite covariance matrix Ω=(ωvw). Then X has mean vector (IΛ)Tλ0 and covariance matrix

ϕ(Λ,Ω):=(IΛ)TΩ(IΛ)1=Σ

where I is the n×n identity matrix. L-SEMs have been widely applied in a variety of settings due to the clear causal interpretation of their parameters [1, 2, 3].

Following an approach that dates back to Wright [4, 5], we may view Λ and Ω as (weighted) adjacency matrices corresponding to directed and bidirected graphs, respectively. This yields a natural correspondence between L-SEMs and mixed graphs, that is, graphs with both directed edges, vw, and bidirected edges, vw. More precisely, the mixed graph G is associated to the L-SEM in which λvw is assumed to be zero if vw/G and, similarly, ωvw=0 when vw/G. We write ϕG for the map obtained by restricting the map φ from (2) to pairs (Λ,Ω) that satisfy the conditions encoded by the graph G. We note that mixed graphs used to represent L-SEMs are often also called path diagrams.

Figure 1
Figure 1

The mixed graph for the instrumental variable model.

Citation: Journal of Causal Inference 6, 1; 10.1515/jci-2017-0009

Example 1.1.

The mixed graph in Figure 1 corresponds to the well-known instrumental variable model [6]. In equations, this model asserts that

X1=λ01+ϵ1,X2=λ02+λ12X1+ϵ2,andX3=λ03+λ23X2+ϵ3,

where ϵ has 0 mean and covariance matrix

Ω=ω11000ω22ω230ω23ω33.

In this model, the random vector X=(X1,X2,X3) has covariance matrix

Σ=1λ12001λ23001Tω11000ω22ω230ω23ω331λ12001λ230011=ω11λ12ω11λ12λ23ω11λ12ω11ω11λ122+ω22λ23ω11λ122+λ23ω22+ω23λ12λ23ω11λ23ω11λ122+λ23ω22+ω23ω33+2ω23λ23+λ232σ22.

A first question that arises when specifying an L-SEM via a mixed graph G is whether the map ϕG is injective, that is, whether any (Λ,Ω) in the domain of ϕG can be uniquely recovered from the covariance matrix ϕG(Λ,Ω). When this injectivity holds we say that the model and also simply the graph G is globally identifiable. Whether or not global identifiability holds can be decided in polynomial time [7, 8, 9]. However, in many cases global identifiability is too strong a condition. Indeed, the canonical instrumental variables model is not globally identifiable.

We will be instead interested in generic identifiability, that is, whether (Λ,Ω) can be recovered from ϕG(Λ,Ω) with probability 1 when choosing (Λ,Ω) from any continuous distribution on the domain of ϕG. A current state-of-the-art, polynomial time verifiable, criterion for checking generic identifiability of a given mixed graph is the half-trek criterion (HTC) of [10], with generalizations by [11, 12, 13]. The sufficient condition that is part of the HTC operates by iteratively discovering invertible linear equation systems in the Λ parameters which it uses to prove generic identifiability. A necessary condition given by the HTC detects cases in which the Jacobian matrix of ϕG fails to attain full column rank which implies that the parameterization ϕG is generically infinite-to-one. However, there remain a considerable number of cases in which the HTC remains inconclusive, that is, the graph satisfies the necessary but not the sufficient condition for generic identifiability.

We extend the applicability of the HTC in two ways. First, we show how the theorems on trek separation in [14] can be used to discover determinantal relations that in turn can be used to prove the generic identifiability of individual edge coefficients in L-SEMs. This method generalizes the use of conditional independence in known instrumental variable techniques; compare e.g. [15]. Once we have shown that individual edges are generically identifiable with this new method, it would be ideal if identified edges could be integrated into the equation systems discovered by the HTC to prove that even more edges are generically identifiable. Unfortunately, the HTC is not well suited to integrate single edge identifications as it operates simultaneously on all edges incoming to a given node. Our second contribution resolves this issue by providing an edgewise half-trek criterion which operates on subsets of a node’s parents, rather than all parents at once. This edgewise criterion often identifies many more coefficients than the usual HTC. We note that, in the process of preparing this manuscript we discovered independent work of Chen [16]; some of our results can be seen as a generalization of results in his work.

The rest of this paper is organized as follows. In Section 2, we give a brief overview of the necessary background on mixed graphs, L-SEMs, and the half-trek criterion. In Section 3, we show how trek-separation allows the generic identification of edge coefficients as quotients of subdeterminants. We introduce the edgewise half-trek criterion in Section 4 and we discuss necessary conditions for the generic identifiability of edge coefficients in Section 5. Computational experiments showing the applicability of our sufficient conditions follow in Section 6, and we finish with a brief conclusion in Section 7. Some longer proofs are deferred to the appendix.

2 Preliminaries

We assume some familiarity with the graphical representation of structural equation models and only give a brief overview of our objects of study. A more in-depth introduction can be found, for example, in [2] or, with a focus on the linear case considered here, in [17].

2.1 Mixed graphs and covariance matrices

Nonzero covariances in an L-SEM may arise through direct or through confounding effects. Mixed graphs with two types of edges have been used to represent these two sources of dependences.

Definition 2.1 (Mixed Graph).

A mixed graph on n vertices is a triple G=(V,D,B) where V={1,,n} is the vertex set, DV×V are the directed edges, and BV×V are the bidirected edges. We require that there be no self-loops, so (v,v)/D,B for all vV. If (v,w)D, we will write vwG and if (v,w)B, we will write vwG. As bidirected edges are symmetric we will also require that B is symmetric, so that (v,w)B(w,v)B.

Let v and w be two vertices of a mixed graph G=(V,D,B). A path from v to w is any sequence of edges from D or B beginning at v and ending at w. Here, we allow that directed edges be traversed against their natural direction (i.e., from head to tail). We also allow repeated vertices on a path. Sometimes, such paths are referred to as walks or also semi-walks. A path from v to w is directed if all of its edges are directed and point in the same direction, away from v and towards w.

Definition 2.2 (Treks and half-treks).

(a) A path π from a sourcev to a targetw is a trek if it has no colliding arrowheads, that is, π is of the form

vlLvl1Lv0Lv0Rv1Rvr1RvrRorvlLvl1Lv1LvTv1Rvr1RvrR,

where vlL=v, vrR=w, and vT is the top node. Each trek π has a left-hand side Left(π) and a right-hand side Right(π). In the former case, Left(π)={v0L,,vlL} and Right(π)={v0R,,vrR}. In the latter case, Left(π)={vT,v1L,,vlL} and Right(π)={vT,v1R,,vrR}, with vT a part of both sides.

(b) A trek π is a half-trek if |Left(π)|=1. In this case π is of the form

v0Lv0Rv1Rvr1RvrRorvTv1Rvr1RvrR.

In particular, a half-trek from v to w is a trek from v to w which is either empty, begins with a bidirected edge, or begins with a directed edge pointing away from v.

Some terminology is needed to reference the local neighborhood structure of a vertex v. For the directed part (V,D), it is standard to define the set of parents and the set of descendents of v as

pa(v)={wV:wvG},des(v)={wV:a non-empty directed path from v to w in G},

respectively. The nodes incident to a bidirected edge can be thought of as having a common (latent) parent and thus we refer to the bidirected neighbors as siblings and define

sib(v)={wV:wvG}.

Finally, we denote the sets of nodes that are trek reachable or half-trek reachable from v by

tr(v)={wV:a non-empty trek from v to w in G},htr(v)={wV:a non-empty half-trek from v to w in G}.

Two sets of matrices may be associated with a given mixed graph G=(V,D,B). First, RregD is the set of real n×n matrices Λ=(λvw) with support D, i.e., those matrices Λ with λvw/=0 implying vwG and for which IΛ invertible. Second, PD(B) is the set of positive definite matrices with support B, i.e., if v/=w, then ωvw/=0 implies vwG. Based on (2), the distributions in the L-SEM given by G have a covariance matrix Σ that is parameterized by the map

ϕG:(Λ,Ω)(IΛ)TΩ(IΛ)1

with domain Θ:=RregD×PD(B).

Remark 2.3.

Our focus is solely on covariance matrices. Indeed, in the traditional case where the errors ϵ in (1) follow a multivariate normal distribution the covariance matrix contains all available information about the parameters (Λ,Ω).

Subsequently, the matrices Λ,Ω and Σ will also be regarded as matrices of indeterminants. The entries of (IΛ)1=I+k=1Λk may then be interpreted as formal power series. Let Λ and Ω be matrices of indeterminants with zero pattern corresponding to G. Then Σ=ϕG(Λ,Ω) has entries that are formal power series whose form is described by the Trek Rule of [4], see also Spirtes, Glymour, and Scheines [3]. The Trek rule states that for every v,wV the corresponding entry of ϕG(Λ,Ω) is the sum of all trek monomials corresponding to treks from v to w.

Definition 2.4 (Trek Monomial).

Let v,wV be two, not necessarily distinct, vertices, and let T(v,w) be the set of all treks from v to w in G. If πT(v,w) contains no bidirected edge and has top node z, its trek monomial is defined as

π(Λ,Ω)=ωzzxyπλxy.

If π contains a bidirected edge connecting u,zV, then its trek monomial is

π(Λ,Ω)=ωuzxyπλxy.

Proposition 2.5

[Trek Rule] The covariance matrix Σ=ϕG(Λ,Ω) corresponding to a mixed graph G satisfies

Σvw=πT(v,w)π(Λ,Ω),v,wV.

2.2 Generic identifiability

We now formally introduce our problem of interest and review some of the prior work our results build on. We recall that an algebraic set is the zero-set of a collection of polynomials. An algebraic set that is a proper subset of Euclidean space has measure zero; see, e.g., the lemma in [18].

Definition 2.6 (Generic Identifiability).

(a) The model given by a mixed graph G is generically identifiable if there exists a proper algebraic subset AΘ such that the fiber F(Λ,Ω):=ϕG1({ϕG(Λ,Ω)}) is a singleton set, that is, it satisfies

F(Λ,Ω)={(Λ,Ω)}

for all (Λ,Ω)ΘA. In this case we will say, for simplicity, that G is generically identifiable.

(b) Let {\rm proj}vw be the projection (Λ,Ω)λvw for vwG. We say that the edge coefficient λvw is generically identifiable if there exists a proper algebraic subset AΘ such that {\rm proj}vw(F(Λ,Ω))={λvw} for all (Λ,Ω)ΘA. In this case, we will say that the edge vw is generically identifiable.

In all examples we know of, if generic identifiability holds, then the parameters can in fact be recovered using rational formulas.

Definition 2.7 (Rational Identifiability).

(a) A mixed graph G, or rather the model it defines, is rationally identifiable if there exists a rational map ψ and a proper algebraic subset AΘ such that ψϕG is the identity on ΘA.

(b) An edge vwG, or rather the coefficient λvw, is rationally identifiable if there exists a rational function ψ and a proper algebraic subset AΘ such that ψϕG(Λ,Ω)=λvw for all (Λ,Ω)ΘA.

We now introduce the half-trek criterion (HTC) from [10]. We generalize this criterion in Section 4.

Definition 2.8 (Trek and Half-Trek Systems).

Let Π={π1,,πm} be a collection of treks in G and let S,T be the set of sources and targets of the πi respectively. Then we say that Π is a system of treks from S to T. If each πi is a half-trek, then Π is a system of half-treks. A collection Π={π1,,πm} of treks is said to have no sided intersection if

Left(πi)Left(πj)==Right(πi)Right(πj),i/=j.

As our focus will be on the identification of individual edges in G we do not state the identifiability result of [10] in its usual form, instead we present a slightly modified version which is easily seen to be implied by the proof of Theorem 1 in [10].

Definition 2.9.

A set of nodes YV satisfies the half-trek criterion with respect to a vertex vV if

  • |Y|=|\textrm{pa}(v)|,
  • Y({v}\textrm{sib}(v))=, and
  • there is a system of half-treks with no sided intersection from Y to \textrm {pa}(v).

Theorem 2.10

[HTC-identifiability] Suppose that in the mixed graph G=(V,D,B) the set YV satisfies the half-trek criterion with respect to vV. If all directed edges uyG with head yhtr(v)Y are generically (rationally) identifiable, then all directed edges with v as a head are generically (rationally) identifiable.

The sufficient condition for rational identifiability of G in [10] is obtained through iterative application of Theorem 2.10.

3 Trek separation and identification by ratios of determinants

Let Λ and Ω be matrices of indeterminants corresponding to a mixed graph G=(V,D,B) as specified in Section 2.1. Let S,TV, and let ΣS,T be the submatrix of Σ=ϕG(Λ,Ω)Rn×n obtained by retaining only the rows and columns indexed by S and T, respectively. The (generic) rank of such a submatrix ΣS,T can be completely characterized by considering the trek systems between the vertices in S and T. The formal statement of this result follows.

Definition 3.1 (t-separation).

A pair of sets (L,R) with L,RVt-separates the sets S,TV if every trek between a vertex sS and a vertex tT intersects L on the left or R on the right.

In this definition, the symbols L and R are chosen to suggest left and right. Similarly, S and T are chosen to indicate sources and targets, respectively.

Theorem 3.2 ([14], [19])

Let r be a non-negative integer. The submatrix ΣS,T has generic rank r if and only if there exist sets L,RV with |L|+|R|r such that (L,R) t-separates S and T.

Theorem 2.7 from [14] established this result for acyclic mixed graphs while [19] extended the result to all mixed graphs and even gave an explicit representation of the rational form of the subdeterminant |ΣS,T|, for |S|=|T|. An immediate corollary to the above theorem, considering the proof of Theorem 2.17 in [14], rephrases its statement in terms of maximum flows in a special graph. For an introduction to maximum flow, and the well-known Max-flow Min-cut Theorem, see the book by Cormen et al. [20]. Note that standard max-flow min-cut framework does not allow vertices to have maximum capacities or for there to be multiple sources and targets, introducing these modifications is, however, trivial and the resulting theorem is sometimes called the Generalized Max-flow Min-cut Theorem.

Corollary 3.3

Let Gflow=(Vf,Df) be the directed graph with Vf={1,,n}{1,,n} and Df containing the following edges:

ijif jiG,
iifor all iV,
ijif ijG,and
ijif ijG.

Turn Gflow into a network by giving all vertices and edges capacity 1. Let S={s1,,sk},T={t1,,tm}V. Then ΣS,T has generic rank r if and only if the max-flow from s1,,sk to t1,,tm in Gflow is r.

proof.

Add vertices u,v, with infinite capacity, to the graph Gflow along with edges, all with capacity 1, usi, for 1ik, and tjv, for 1jm. Let L,R be such that they t-separate the sets S,T and |L|+|R| is minimal. By Theorem 3.2, ΣS,T has rank |L|+|R| generically. Note that LR gives the minimal size st cut (of size |L|+|R|). By the (generalized) Max-flow Min-cut theorem the max-flow from u to v is |L|+|R|, and it is also the max flow from s1,,sk to t1,,tm. Hence ΣS,T has generic rank equal to the found max-flow.

Note that the maximum flow between vertex sets in a graph can be computed in polynomial time. Indeed, in our case, the conditions of Corollary 3.3 can be checked in O(|V|2max{m,k}) time [20, page 725]. As the following example shows, Corollary 3.3 can be used to find determinantal constraints on Σ. These constraints can then be leveraged to identify edges in G.

Figure 2
Figure 2

(a) A graph G that is generically identifiable but for which the HTC fails to identify any coefficients. (b) The corresponding flow graph Gflow, black edges correspond to (5), red edges to (6), and blue edges to (4) and (7).

Citation: Journal of Causal Inference 6, 1; 10.1515/jci-2017-0009

Figure 3
Figure 3

A graph for which Theorem 3.8 can be used to to certify that the edge λ12 is identifiable when Theorem 3.5 cannot.

Citation: Journal of Causal Inference 6, 1; 10.1515/jci-2017-0009

Example 3.4

Consider the mixed graph G=(V,D,B) in Figure 2a, which is taken from Figure 3c in [10]. The corresponding flow network Gflow is shown in Figure 2b. From Gröbner basis computations, G is known to be rationally identifiable but the half-trek criterion fails to certify that any edge of G is generically identifiable. Let S={1,2,4} and T={1,3,5}. Corollary 3.3 implies that ΣS,T has generically full rank as there is a flow of size 3 from S to T={1,3,5} in Gflow, via the paths 13, 21, and 45. Now suppose that we remove the 45 edge from G, call the resulting graph Gˉ, and let Σˉ be the covariance matrix corresponding to Gˉ. Then one may check that the max-flow from S to T in Gˉflow is 2. Thus |Σˉ{1,2,4},{1,3,5}|=0 where || denotes the determinant. Now note that λ45σ14 is the sum of all monomials given by treks from 1 to 5 that end in the edge λ45. Hence, σ15λ45σ14 is obtained by summing over all treks from 1 to 5 that do not end in the edge 45. But in our graph this is just the sum over treks from 1 to 5 that do not use the edge 45 at all. Therefore, σˉ15=σ15λ45σ14. Similarly, it is straightforward to check that

Σˉ{1,2,4},{1,3,5}=σ11σ13σ15λ45σ14σ21σ23σ25λ45σ24σ41σ43σ45λ45σ44.

By the multilinearity of the determinant, we deduce that

0=|Σˉ{1,2,4},{1,3,5}|=σ11σ13σ15σ21σ23σ25σ41σ43σ45λ45σ11σ13σ14σ21σ23σ24σ41σ43σ44=|Σ{1,2,4},{1,3,5}|λ45|Σ{1,2,4},{1,3,4}|.

Applying Corollary 3.3 a final time, we recognize that |Σ{1,2,4},{1,3,4}| is generically non-zero and, thus, the equation

λ45=|Σ{1,2,4},{1,3,5}||Σ{1,2,4},{1,3,4}|

generically and rationally identifies λ45. In this case, the same strategy can be used to identify the edges 12 and 13 (but not 14) in G.

In the above example, there is a correspondence between trek systems in G and trek systems in Gˉ, the graph that has the edge to be identified removed. This allowed us to leverage Corollary 3.3 directly to show that (8) has determinant 0. Such a correspondence cannot always be obtained but exists in the following case.

Theorem 3.5

Let G=(V,D,B) be a mixed graph. Let w0v be an edge in G, and suppose that the edges w1v,,wvG are known to be generically (rationally) identifiable. Let Gˉ be the subgraph of G with the edges w0v,,wvG removed. Suppose there are sets SV{v}, TV{v,w0} with |S|=|T|+1=k such that:

  • \textrm {des}(v)(ST{v})=,
  • the max-flow from S to T{w0} in Gflow equals k, and
  • the max-flow from S to T{v} in Gˉflow is smaller than k.

Then w0v is generically (rationally) identifiable by the equation

λw0v=|ΣS,T{v}|i=1λwiv|ΣS,T{wi}||ΣS,T{w0}|.

proof.

Let Σ and Σˉ be the covariance matrices corresponding to G and Gˉ, respectively. Since des(v)(ST{v})=, we have that σst=σˉst for all sS and tT. This holds because if a trek from s to t uses an edge wiv then either s{v}des(v) or t{v}des(v), violating our assumptions.

Now let sS and 0i. Suppose that π is a trek from s to v that uses the edge wiv. Then since s/{v}des(v) we must have that wiv is used only on the right-hand side of π. With v/des(v) it follows that wiv is the last edge used in the trek because π may only use directed edges after using wiv and must end at v. Hence, all treks from s to v which use wiv must have this edge as their last edge on the right. But σswiλwiv is obtained by summing over all treks from s to v which end in the edge wiv and, thus, σsvσswiλwiv is the sum of the monomials for all treks from s to v that do not use the wiv edge at all.

As the above argument holds for all 0i, it follows that σˉsv=σsvi=0kσswiλwiv. Since this is true for all sS it follows, similarly as in Example 3.4, that

|ΣˉS,T{v}|=|ΣS,T{v}|i=0kλwiv|ΣS,T{wi}|.

Using assumption (c) and applying Corollary 3.3, we have |ΣˉS,T{v}|=0. Similarly, by assumption (b), |ΣS,T{w0}|/=0 generically. The desired result follows.

Remark 3.6.

Theorem 3.5 generalizes the ideas underlying instrumental variable methods such as those discussed in [15]. Indeed, this prior work uses d-separation as opposed to t-separation. D-separation characterizes conditional independence which in the present context corresponds to the vanishing of particular almost principal determinants of the covariance matrix. In contrast, Theorem 3.5 allows us to leverage arbitrary determinantal relations; compare [14]. The graph in Figure 2a is an example in which d-separation and traditional instrumental variable techniques cannot explain the rational identifiability of the coefficient for edge 45.

While assumption (a) in the above Theorem allows for the easy application of Corollary 3.3, this assumption can be relaxed by generalizing one direction of Corollary 3.3. We state this generalization as the following lemma, which is concerned with asymmetric treatment of edges that appear on the left versus right-hand side of treks. The lemma’s proof is deferred to Appendix A.

Lemma 3.7

Let G=(V,D,B) be a mixed graph, and let Λ=(λuv) and Ω be the matrices of indeterminants corresponding to the directed and the bidirected part of G, respectively. Let DL,DRD and define n×n matrices ΛL and ΛR with

ΛuvL=λuvif (u,v)DL,0otherwise,and ΛuvR=λuvif (u,v)DR,0otherwise.

Define a network Gflow=(V,D) with vertex set V={1,,n}{1,,n}, edge set D containing

ijif (j,i)DL,
iifor all iV,
ijif (i,j)B,
ijif (i,j)DR,and

with all edges and vertices having capacity 1. Let Γ=(IΛL)TΩ(IΛR)1. Then, for any S,TV with |S|=|T|=k, we have that |ΓS,T|=0 if the max-flow from S to T in Gflow is <k.

We may now state our more general result.

Theorem 3.8

Let G=(V,D,B) be a mixed graph, w0vG, and suppose that the edges w1v,,wvG are known to be generically (rationally) identifiable. Recalling Equation (13), let Gflow be Gflow with the edges w0v,,wv removed. Suppose there are sets SV and TV{v,w0} such that |S|=|T|+1=k and

  • {\textrm des}(v)(T{v})=,
  • the max-flow from S to T{w0} in Gflow equals k, and
  • the max-flow from S to T{v} in Gflow is <k.

Then w0v is rationally identifiable by the equation

λw0v=|ΣS,T{v}|i=1λwiv|ΣS,T{wi}||ΣS,T{w0}|.

proof.

By assumption (b) and Corollary 3.3, |ΣS,T{w0}| is generically non-zero. Therefore, equation (14) holds if

|ΣS,T{v}|i=0λwiv|ΣS,T{wi}|=0.

To show this we note that, by the multilinearity of the determinant, we have

|ΣS,T{v}|i=0λwiv|ΣS,T{wi}|=σs1t1σs1tk1σs1vi=0λwivσs1wiσs2t1σs2tk1σs2vi=0λwivσs2wiσskt1σsktk1σskvi=0λwivσskwi.

Write Γ for the matrix that appears on the right-hand side of this equation.

Consider any two indices i and j with 1ik and 1jk1. If a trek from si to tj uses one of the edges wmv, for 0m, on its right-hand side then tjdes(v), a contradiction since des(v)T= by assumption (a). Similarly, since v/des(v) the difference σsivj=0λwjvσsiwj is obtained by summing the monomials for treks between si and v which do not use any edge wjv on their right side. From this we may write

Γ=((IΛ)TΩ(IΛ)1)S,T{v}

where Λ equals Λ but with its (wj,v), 0j, entries set to 0. The fact that |Γ|=0 under assumption (c) is the content of Lemma 3.7 (where we take ΛL=Λ and ΛR=Λ). Given this lemma our desired result then follows.

Clearly Theorem 3.8 can be applied whenever Theorem 3.5 can. Moreover, as the next example shows, there are cases in which Theorem 3.8 can be used while Theorem 3.5 cannot.

Example 3.9.

Let G=(V,D,B) be the mixed graph from Figure 3. Take S={3,5} and T={4}. Then Theorem 3.8 implies that λ12 is rationally identifiable. Theorem 3.5 cannot be applied in this case as Sdes(2)/=.

For a fixed choice of S and T, the conditions (a)-(c) in Theorem 3.8 can be verified in polynomial time. Indeed, conditions (b) and (c) involve only max-flow computations that take O(|V|3) time in general. Condition (a) can be checked by computing the descendants of v, which can be done with any O(|D|) graph traversal algorithm (e.g., depth first search, see [20]), and then computing the intersection between the descendants and T{v} which can be done in O(|V|log|V|) time.

In order to apply Theorem 3.8 algorithmically, however, we have to consider all possible subsets SV, TV{v,w0} and check our condition for each pair. Naively done this operation takes exponential time. It remains an interesting problem for further study to determine whether or not the problem of finding suitable sets S and T is NP-hard. We note that a similar problem arises for instrumental variables/d-separation, where [21] were able to give a polynomial time algorithm for finding suitable sets in graphs that are acyclic. Given our results so far we will maintain polynomial time guarantees simply by considering only subsets S,T of bounded size |S|,|T|m.

4 Edgewise generic identifiability

While our results from Section 3 can be used together with the HTC there is notable lack of synergy between the two methods as Theorem 3.8 requires that all directed edges incoming to a node be generically identifiable before that node can be used to prove the generic identifiability of other edges. Aiming to strengthen the HTC while allowing it to better use identifications produced by Theorem 3.8, the following theorem establishes a sufficient condition for the generic identifiability of any set of incoming edges to a fixed node. While in the process of preparing this manuscript we discovered the work of Chen [16]; our following theorem can be seen as a generalization of his Theorem 1, see Remark 4.2 for a discussion of the primary difference between our theorem and that in [16].

Theorem 4.1

Let G=(V,D,B) be a, non-empty, mixed graph and let vV. Let Wpa(v) and suppose there exists YV({v}sib(v)) with |Y|=|W| such that,

  1. [(i)] there exists a half-trek system from Y to W with no sided intersection,
  2. [(ii)] for every trek π from yY to v we have that either
    • π ends with an edge of the form sv where either sW or sv is known to be generically (rationally) identifiable, or
    • π begins with an edge of the form ys where sy is known to be generically (rationally) identifiable.

Then for each wW we have that wv is generically (rationally) identifiable.

proof

Let (Λ,Ω) be the matrices of indeterminants corresponding to G, and let Σ=(IΛ)TΩ(IΛ)1 be the covariance matrix. Recall our notation T(x,z) for the set of treks from x to z in G. By the trek rule (Prop. 2.5), Σxz=πT(x,z)π(Λ,Ω) is the sum of monomials for treks from v to w.

Recalling that |Y|=|W|, enumerate W={w1,,wk} and Y={y1,,yk}. Now, for 1ik, let HiD be the set of all edges incoming to yi known to be generically (rationally) identifiable.

Our approach is to build a linear system of k equations in the k unknowns λw1v,...,λwkv, having a unique solution. Consider the set T(y1,v) of all treks between y1 and v. Because of condition (ii) we have that y1v/G and all treks from y1 to v either end in a directed edge of the form sv, with sW or sv known to be generically identifiable, or must start in a directed edge of the form y1h for some hH1. Now note that for any ppa(v),

πT(y1,p)π(Λ,Ω)hH1πT(h,p)π(Λ,Ω)λpy1=Σy1phH1Σhpλhy1

equals the sum of the monomials for all treks from y1 to p that do not start with a directed edge of the form y1h for hH. Hence we find that the sum of all monomials for treks from y1 to v that do not start with an edge of the form y1h for hH1 equals

ppa(v)(Σy1phH1Σhpλhy1)λpv.

Now the sum over all treks between y1 and v that start with an edge of the form y1h for hH1 is easily seen to be the quantity hH1Σvhλhy1. Thus,

Σy1v=ppa(v)(Σy1phH1Σhpλhy1)λpv+hH1Σvhλhy1

Rewriting this we have

wW(Σy1whH1Σhwλhy1)λwv=Σy1vppa(v)W(Σy1phH1Σhpλhy1)λpvhH1Σvhλhy1.

Notice that, in the above equation, if ppa(v)W and Σy1phH1Σhpλhy1/=0 then it must be the case that there is a trek π from y1 to v ending in the edge pv which does not start with an edge of the form y1s where sy1 is known to be generically identifiable. It then follows, by condition (ii)(a), that since p/W we must have that λpv is known to be generically identifiable. It then follows that the only unknowns quantities (that is, those not assumed to be generically identifiable) in the above displayed equation are the λwv which appear linearly on the left hand side. Thus we have exhibited one linear equation in the k unknown parameters λwjv.

Repeating the above argument for each of the yi, we obtain k linear equations in k unknowns. It remains to show that the system of equations is generically non-singular. This amounts to showing generic invertibility for the k×k matrix A with entries

Aij=ΣyiwjhHiΣhwjλhyi.

The invertibility of A follows from the existence of the half-trek system from Y to ω33 with no sided intersection and Lemma 4.3 below. We conclude that each wiv is generically (rationally) identifiable as claimed.

Remark 4.2

Our Theorem 4.1 generalizes Theorem 1 in [16] in two ways. Firstly, we make the trivial, but for our purposes important, modification to formulate our theorem in a fashion that is agnostic as to how prior generic identifications were obtained. For the presentation in [16] it was more natural to focus only on such identifications being obtained from prior applications of his theorem. Secondly, and more substantially, the results in [16] do not consider the possibility that, recalling the setting of Theorem 4.1, some of the edges incoming to v may be known to be generically identifiable; failing to use this information makes the conditions on the set Y more restrictive. Indeed, but for our first modification, our theorem reduces to the result in [16] if we replace condition (ii)(a) by the condition “π ends with an edge of the form sv where sW.”

As an example of how the above difference can appear in practice consider Figure 4 and suppose we have restricted the size of edge sets W we consider to be of size 1 (for larger graphs, this may be required for computational efficiency). Then, using Y={1} and W={3}, one easily checks that 34 is generically identifiable. But now, showing that 24 is generically identifiable using W={2} is impossible using Theorem 1 in [16] because of the trek 234 but this trek provides no problem for Theorem 4.1 as we have already shown that 34 is generically identifiable.

Figure 4
Figure 4

A graph that serves to illustrate differences between Theorem 1 of [16] and our Theorem 4.1.

Citation: Journal of Causal Inference 6, 1; 10.1515/jci-2017-0009

The following lemma generalizes Lemma 2 from [10] and completes the proof of Theorem 4.1.

Lemma 4.3

Let G=(V,D,B) be a mixed graph on n nodes with associated covariance matrix Σ. Moreover, let S={s1,,sk},T={t1,,tk}V. For every 1ik let Hi={h1i,,hii}pa(si). Suppose there exists a half-trek system from S to T with no sided intersection. Then the k×k matrix A defined by

Aij=Σsitjk=1iΣhkitjλhkisi

is generically invertible.

The proof of this lemma is deferred to Appendix B. Note that if let W=pa(v) and strengthen condition (ii)(b) to require that all edges incoming to y be generically identifiable whenever there exists a half-trek from v to y, then Theorem 4.1 reduces to Theorem 2.10 of [10], the usual half-trek identifiability theorem.

The conditions of Theorem 4.1 can be easily checked in polynomial time using max-flow computations, just as with the standard half-trek criterion. Unfortunately, in general, we do not know for which subset Wpa(v) we should be checking the conditions of Theorem 4.1. This, in practice, means that we will have to check all subsets Wpa(v). There are, of course, exponentially many such subsets in general. If we are in a setting where we may assume that all vertices have bounded in-degree, then checking all subsets requires only polynomial time. In the case that in-degrees are not bounded, we may also maintain polynomial time complexity by only considering subsets W of sufficiently large or small size. We provide pseudocode for an algorithm to iteratively identify the coefficients of a mixed graph leveraging Theorem 4.1 in Algorithm Algorithm 1.

Algorithm 1 Edgewise identification algorithm.
1:Input: A mixed graph G=(V,D,B) with V={1,,n} and a set of edges, solvedEdges, known to be generically identifiable.
2:repeat
3:forv1,,ndo
4:unsolved{wVwvG and wv/solvedEdges}.
5:maybeAllowed
{yV({v}sib(v))zhtr(v)pa(y)zysolvedEdges}
6:for/=Wunsolveddo
7:allowed{ymaybeAllowedhtr(y)tr({ppa(y)py/solvedEdges})unsolvedW}
8:exists Using max-flow computations, does there exist a half-trek system from allowed to W of size |W| with no sided intersection?
9:ifexists is true true
10:solvedEdgessolvedEdges{eveE}
11:Break out of the current loop
12:end if
13:end for
14:end for
15:until No additional edges have been added to solvedEdges on the most recent loop.
16:Output: solvedEdges, the set of edges found to be generically (rationally) identifiable.

5 Edgewise generic nonidentifiability

In prior sections we have focused solely on sufficient conditions for demonstrating the generic identifiability of edges in a mixed graph. This, of course, begs the question of if there are any complementary necessary conditions. That is, if there exist conditions that, when failed, show that a given edge is generically many-to-one. To our knowledge, the following is the only known necessary condition for generic identifiability and considers all parameters of a mixed graph G simultaneously.

Theorem 5.1

(Theorem 2 of [10])Suppose G=(V,D,B) is a mixed graph in which every family (Yv:vV) of subsets of the vertex set V either contains a set Yv that fails to satisfy the half-trek criterion with respect to v or contains a pair of sets (Yv,Yw) with vYw and wYv. Then the parameterization ϕG is generically infinite-to-one.

This theorem operates by showing that, given its conditions, the Jacobian of the map ϕG fails to have full column rank and thus must have infinite-to-one fibers. Unfortunately this theorem does not give any indication regarding which edges are, in particular, generically infinite-to-one. The theorem below gives a simple condition which guarantees that a directed edge is generically infinite-to-one.

Theorem 5.2

Let G=(V,D,B) be a mixed graph and let vwG. Suppose that for every zV{w} we have either zwG or v is not half-trek reachable from z. Let {\textrm proj}vw be the projection (Λ,Ω)λvw for vwG. Then {\textrm proj}vw(F(Λ,Ω)) is infinite for all (Λ,Ω)Θ=RregD×PD(B).

proof

Let (Λ,Ω)Θ and Σ=ϕG(Λ,Ω)=(IΛ)TΩ(IΛ)1. We will show that for each matrix Γ=(γxy)RregD that agrees with Λ in all but (possibly) the (v,w) entry, we can find ΨPD(B) for which ϕG(Γ,Ψ)=Σ. The claim then follows by noting that the choices for Γ allow for infinitely many values of γvw.

Let ΓRregD be as above, and let x/=yV be such that xy/G. Then

((IΓ)TΣ(IΓ))xy=σxyzpa(x)σyzγzxzpa(y)σxzγzy+zpa(x)zpa(y)γzxγzyσzz.

Whenever x,y/=w then γzx=λzx and γzy=λzy in the above equation. Thus

0=Ωxy=((IΛ)TΣ(IΛ))xy=((IΓ)TΣ(IΓ))xy.

Next suppose, without loss of generality, that x=w and y/=w. Then, since y is a non-sibling of w, we must have that v is not half-trek reachable from y, and hence σvy=zpa(y)σvzλzy. But then

((IΓ)TΣ(IΓ))wy=σwyzpa(w)σyzγzwzpa(y)σwzγzy+zpa(w)zpa(y)γzwγzyσzz=σvyγvw+zpa(y)γvwλzyσvz+σwyv/=zpa(w)σyzλzwzpa(y)σwzλzy+v/=zpa(w)zpa(y)λzwλzyσzz=γvw(σvyzpa(y)λzyσvz)+σwyv/=zpa(w)σyzλzwzpa(y)σwzλzy+v/=zpa(w)zpa(y)λzwλzyσzz.

Now since σvyzpa(y)λzyσvz=0, we have that 0=γvw(σvyzpa(y)λzyσvz)=λvw(σvyzpa(y)γzyσvz). Therefore,

((IΓ)TΣ(IΓ))wy=λvw(σvyzpa(y)λzyσvz)+σwyv/=zpa(w)σyzλzwzpa(y)σwzλzy+v/=zpa(w)zpa(y)λzwλzyσzz=((IΛ)TΣ(IΛ))wy=Ωwy=0.

Let Ψ=(IΓ)TΣ(IΓ). We have just shown that Ψxy=0 for every x,yV such that xy/G. To see that ΨPD(B) it remains to show that Ψ is positive definite. But this is obvious from its definition since Σ is positive definite and IΓ is invertible. We conclude that ϕG(Γ,Ψ)=Σ which proves the claim.

Figure 5
Figure 5

Two graphs serving to illustrate Theorem 5.2. (a) A graph in which all directed edges are identifiable except 2 3. The 2 3 edge can be shown to be infinite-to-one using Theorem 5.2. (b) A graph known to have generically infinite-to-one parameterization by Theorem 5.1 but for which Theorem 5.2 applies to no edge.

Citation: Journal of Causal Inference 6, 1; 10.1515/jci-2017-0009

Example 5.3.

Let G be the graph in Figure 5a. Using the necessary condition of the HTC, Theorem 5.1, we find that ϕG is generically infinite-to-one. To identify which edges of G are themselves infinite-to-one we use Theorem 5.2. Doing so, one easily finds that the 23 edge of G is generically infinite-to-one. Indeed, using the edgewise identification techniques of Section 4, we see that all other directed edges of G are generically identifiable so we have completely characterized which directed edges of G are, and are not, generically identifiable.

We stress, however, that Theorem 5.2 does not imply Theorem 5.1; that is, there are graphs G for which Theorem 5.1 shows ϕG is infinite-to-one but Theorem 5.2 cannot verify that any edges of G are infinite-to-one. For example, see Figure 5b.

6 Computational experiments

In this section we will provide some computational experiments that demonstrate the usefulness of our theorems in extending the applicability of the half-trek criterion. All of our following experiments are carried out in the R programming language and the following algorithms are implemented in our R package SEMID which is available on CRAN, the Comprehensive R Archive Network [22, 23], as well as on GitHub.1 We will be considering four different identification algorithms for checking generic identifiability:

  • The standard half-trek criterion (HTC) algorithm.
  • The edgewise identification (EID) algorithm, displayed in Algorithm 1, where the input set of solvedEdges is empty.
  • The trek-separation identification (TSID) algorithm. Similarly as for Algorithm 1 this algorithm iteratively applies Theorem 3.8 until it fails to identify any additional edges. (Since we are considering a small number of nodes there is no need to limit the size of sets S and T we are searching for in our computation.)
  • The EID+ TSID algorithm. This algorithm alternates between the EID and TSID algorithms until it fails to identify any additional edges.

We emphasize that when all of the directed edges, i.e., the matrix Λ is generically (rationally) identifiable then we also have that Ω=(IΛ)TΣ(IΛ) is generically (rationally) identifiable.

In Table 1 from [24], the authors list all 112 acyclic non-isomorphic mixed graphs on 5 nodes which are generically identifiable but for which the half-trek criterion remains inconclusive even when using decomposition techniques. We run the EID, TSID, and EID+ TSID algorithms upon the 112 inconclusive graphs and find that 23 can be declared generically identifiable by the EID algorithm, 0 by the TSID algorithm, and 98 by the EID+ TSID algorithm. Thus it is only by using both the determinantal equations discovered by t-separation and the edgewise identification techniques that one sees a substantial increase in the number of graphs that can be declared generically identifiable.

We observe a similar trend to the above when allowing cyclic mixed graphs. In Table 2 of [24], the authors list 75 randomly chosen, cyclic (i.e., containing a loop in the directed part), mixed graphs that are known to be rationally identifiable but cannot be certified so by the half-trek criterion. Of these 75 graphs, 4 are certified to be generically identifiable by the EID algorithm, 0 by the TSID algorithm, and 34 by the EID+ TSID algorithm.

A listing of the 14 acyclic and 41 cyclic mixed graphs that could not be identified by the EID+ TSID algorithm are listed as integer pairs (d,b)N2 in Table 1. The algorithm to convert a pair (d,b) in that table to a mixed graph G on n nodes is

  1. For v1,,n, for w1,,v1,v+1,,n, do Add edge vw to G if dmod 2=1 Replace d with d/2
  2. For v1,,n1, for wv+1,,n, do Add edge vw to G if bmod 2=1 Replace b with b/2

See Figure 6 for an example of a cyclic and acyclic graph that the EID+TSID algorithm fails to correctly certify as generically identifiable.

Table 1

Of the 112 acyclic and 75 cyclic mixed graphs on 5 nodes described in Tables 1 and 2 from [24], we display the 12 acyclic and 41 cyclic graphs which are known to be generically identifiable but for which the EID+TSID algorithm could not certify that all edges were generically identifiable. Each graph is encoded as a pair (d,b), see text for details.

AcycliccCyclic
(4456, 113)(345, 440)(6629, 512)(75321, 516)
(360, 117)(71329, 18)(74536, 788)(75398, 20)
(6275, 172)(81089, 0)(5545, 96)(70803, 896)
(6307, 172)(4714, 41)(75112, 72)(4457, 592)
(6275, 188)(70881, 80)(74970, 4)(74883, 522)
(360, 369)(74963, 512)(4579, 384)(350, 112)
(4696, 401)(74886, 268)(70594, 65)(74883, 2)
(4936, 401)(5058, 304)(74921, 66)(74950, 260)
(4936, 402)(70821, 513)(70474, 640)(74890, 38)
(4680, 403)(74915, 6)(74922, 66)(81076, 0)
(840, 466)(5267, 82)(13160, 65)(70851, 32)
(5257, 658)(76852, 128)(4938, 448)(1430, 120)
(5257, 659)(71075, 516)(4730, 640)(5251, 418)
(4680, 914)(4397, 897)(70358, 1)
Figure 6
Figure 6

Two graphs for which the EID+TSID algorithm is inconclusive. (a) is acyclic while (b) contains a cycle.

Citation: Journal of Causal Inference 6, 1; 10.1515/jci-2017-0009

7 Conclusion

By exploiting the trek-separation characterization of the vanishing of subdeterminants of the covariance matrix Σ corresponding to a mixed graph G, we have shown that individual edge coefficients can be generically identified by quotients of subdeterminants. This constitutes a generalization of instrumental variable techniques that are derived from conditional independence. We have also shown how this information, in concert with a generalized half-trek criterion, allows us to prove that substantially more graphs have all or some subset of their parameters generically identifiable.

Our work on identification by ratios of determinants focuses on a single edge coefficient. However, it seems possible to give a generalization that is in the spirit of the generalized instrumental sets from [15]; see also [25]. These leverage several conditional independencies to find a linear equation system that can be used to identify several edge coefficients simultaneously, under specific assumptions on the interplay of the conditional independencies and the edges to be identified. We illustrate the idea of how to do this using general determinants in the following example. However, a full exploration of this idea is beyond the scope of this paper. In particular, we are still lacking mathematical tools that, in suitable generality, could be used to certify that constructed linear equation systems have a unique solution.

Figure 7
Figure 7

A graph where the edges 46 and 56 can be simultaneously proven to be generically identifiable by solving a 2×2 linear system of determinantal equations.

Citation: Journal of Causal Inference 6, 1; 10.1515/jci-2017-0009

example

Let G be the graph in Figure 7 with corresponding covariance matrix Σ=(IΛ)TΩ(IΛ)1. Then, by similar considerations to those in Example 3.4, one may show that

|Σ{3,5},{1,4}||Σ{3,5},{1,5}||Σ{2,4},{1,4}||Σ{2,4},{1,5}|λ46λ56=|Σ{3,5},{1,6}||Σ{2,4},{1,6}|.

Using computer algebra we find that the 2×2 matrix on the left hand side of the above equation has all non-zero polynomial entries, so that this is not equivalent to simply applying Theorem 3.8 for 46 and 56 separately, and has non-zero determinant. It follows that the above system is generically invertible and thus λ46 and λ56 are generically identifiable.

Acknowledgements:

This material is based on work started in June 2016 at the Mathematics Research Communities (Week on Algebraic Statistics). The work was supported by the National Science Foundation under Grant Number DMS 1321794 and 1712535.

Proof of Lemma 3.7

A

We will require a known generalization of the Gessel-Viennot-Lindström lemma which we now state.

Definition A.1.

Let G=(V,D) be a directed graph with vertices V={1,,n} and corresponding matrix of indeterminants Λ. Let π=v1v2v be a directed path in G. Then define the loop erased path LE(π) corresponding to π recursively as follows. If π contains no loops then π=LE(π). Otherwise there exist indices 1i<j such that vi=vj. Then LE(π)=LE(π) where π=v1v2vivj+1v. It can be shown that LE(π) is well defined (i.e. is independent of the ordering of the above recursion).

Lemma A.2

[Gessel-Viennot-Lindström Generalization, Theorem 6.1 from [26]] Let G=(V,D) be a directed graph with vertices V={1,,n} and corresponding matrix of indeterminants Λ. Define Ψ=(IΛ)1 and for any directed path π in G define the path polynomial π(Λ)=wvπλwv. Then for any S={s1,,sk},T={t1,,tk}V we have that

|ΨS,T|=τPnsign(τ)s1π1tτ(1),,skπktτ(k)i<jπjLE(πi)=π1(Λ)πk(Λ),

here the above inner sum is over all directed path systems Π={π1,,πk} with πi going from si to tτ(i) for all i, where πj and LE(πi) share no vertices for i<j. Hence |ΨS,T|=0 if and only if every system of directed paths from S to T has two paths which share a vertex.

The remaining proof of Lemma 3.7 proceeds in several parts and closely follows similar results in [14] and [19]. As such we will state several lemmas whose proofs require only small modifications of existing results (such as replacing the standard Gessel-Viennot-Lindström Lemma with its generalization above). In such cases we will simply direct the reader to the corresponding proof and sketch the necessary modifications.

Definition A.3.

Let G=(V,D,B) be a mixed graph and let UD. We say a trek π in Gavoids U on the left (right) if the left (right) side of π uses no edges from U. Similarly we say a system of treks Π in Gavoids U on the left (right) if every trek πΠ avoids U on the left (right). If UL,URD we say that a trek (or trek system) avoids (UL,UR) if it avoids UL on the left and UR on the right.

Lemma A.4

Let G=(V,D,B) be a mixed graph and let Λ,Ω be n×n matrices of indeterminants corresponding to the directed and bidirected parts of G respectively. Suppose that B= so that Ω is diagonal. Letting DL,DR,ΛL,ΛR,Γ, and Gflow be as in Lemma 3.7 we have that for any S,TV with |S|=|T|=k, |ΓS,T|=0 if and only if

the max-flow from S to T in Gflow is <k.

proof

In the following, whenever we say “As in x,” we mean “As in the proof of x in [14].”

As in Lemma 3.2, we have |ΓS,T|=0 if and only if for every set AV with |A|=K we have |((IΛL)1)S,A|=0 or |((IΛR)1)A,T|=0. As in Prop. 3.5, using the above result, and applying our version of the Gessel-Viennot-Lindström Lemma, we have that |ΓS,T|=0 if and only if every system of (simple) treks avoiding (DDL,DDR) has sided intersection.

Now noticing that B= simplifies the definition of Gflow, we have as in Prop. 3.5 that the (simple) treks from u to v avoiding (DDL,DDR) in G are in bijective correspondence with directed paths from u to v in Gflow. Finally the result follows by noticing that max-flow systems from S to T in Gflow of size k correspond to systems of treks from S to T avoiding (DDL,DDR) with no-sided intersection (that is, if one exists so does the other). Combining the above if and only if statements, the result then follows.

We have now proven our desired result in the case B=, it remains to show that this implies the case B/=. To this end, we say that G˜=(V˜,D˜,B˜) is the bidirected subdivision of G=(V,D,B) if it equals G but where we have replaced every bidirected edge ijG with a vertex v(i,j) and two edges v(i,j)i and v(i,j)j (with associated parameters ω˜(i,j),(i,j),λ˜(i,j)i,λ˜(i,j)j). Note that we have subdivided every bidirected edge into two directed edges which motivates the naming convention. Let D˜L and D˜R be equal to DL and DR respectively but where we have also added in the new edges v(i,j)i and v(i,j)j for every ijG. Let Λ˜,Ω˜ be matrices of indeterminants corresponding to G˜ and let Λ˜L, Λ˜R correspond to D˜L,D˜R just as for G. We now have the following result that relates G and G˜.

Lemma A.5

Let G, G˜ be as in the prior paragraph. Then letting Γ˜=(IΛ˜L)TΩ˜(IΛ˜R)1 we have that, for any polynomial f taking, as input, an n×n matrix of variables, we have that f(Γ)=0 if and only if f(Γ˜)=0. In particular, since the subdeterminant of a matrix is a polynomial in the entries of the matrix, we have that for any S,TV with |S|=|T|=k, |ΓS,T|=0 if and only if |Γ˜S,T|=0.

proof

This proof follows, essentially exactly, as the first part of the proof of Prop. 2.5 in [19].

Now we show that the above subdivision trick produces a graph G˜flow for which the max-flow between vertex sets is the same as for Gflow.

Lemma A.6

Consider the graphs G{\textrm flow}=(V,D) from the Lemma 3.7 statement and let G˜{\textrm flow}=(V˜,D˜) be corresponding flow graph for the bidirected subdivision G˜ of G. Let S={s1,,sk},T={t1,,tk}V. Then the maximum flow from S to T={t1,,tk} in G{\textrm flow} equals the maximum flow from S to T in G˜{\textrm flow}.

proof

Recall that a flow system on a graph is an assignment of flow to the edges and vertices of the graph satisfying the usual flow constraints. Also recall that, for graphs with integral capacities, there always exists a max-flow system between subsets of nodes for which all flow assignments upon edges and vertices take values in N. We will show that any (integral valued) max-flow system from S to T in G˜flow corresponds to a unique flow system in Gflow with the same total flow and vice-versa. Our result then follows.

Let F˜ be a max-flow system from S to T on G˜flow from S to T with integral flow assignments. Since G˜flow and Gflow have all capacities equal to 1 it follows that F˜ assigns either 0 or 1 flow to all edges and vertices in the graph.

We now construct a flow system F on Gflow with the same capacity. First let F assign the same capacity to all edges and vertices that F shares with F˜. Note that if F˜ does not assign any flow to any of the edges incoming to the vertices v(i,j) then F already corresponds to a flow system on Gflow with the same total flow. Suppose otherwise that F˜ assigns 1 unit of flow to the edges {a1va1b1,,akvakbk}. Since v(i,j) and the ai have capacity 1 it follows that ai/=aj and vaibi/=vaibi for all i/=j. For each edge aivaibi, since vaibi has two outgoing edges vaibiai and vaibibi, there are two possible cases:

  1. Case 1: F˜ assigns 1 flow to vaibiai.In this case assign a flow of 1 to the edge aiai in F.
  2. Case 2: F˜ assigns 1 flow to vaibibi.In this case assign a flow of 1 to the edge aibi in F.

It is easy to check that F is indeed a valid flow system on Gflow with the same flow as F˜.

To see the oppose direction let F be a max-flow system from S to T on Gflow from S to T with integral flow assignments. We now construct a flow system F˜ on G˜flow with the same capacity. As before, first let F˜ assign the same capacity to all edges and vertices that F˜ shares with F. Note that if F does not assign any flow to any of the edges ab for (a,b)B then F˜ already corresponds to a flow system on Gflow with the same total flow. Suppose otherwise that F˜ assigns 1 unit of flow to the edges E={a1b1,,akbk} with (ai,bi)B for all i. Since all vertices in F have capacity 1 we must have that ai/=aj and bi/=bj for all i/=j. There are two possible cases:

  1. Case 1: aibiE and biai/E.In this case assign a flow of 1 along the path aivaibibi in F˜.
  2. Case 2: aibiE and biaiE.In this case assign a flow of 1 to the edges aiai and bibi in F˜.

One may now check that F˜ is a valid flow system on G˜flow with the same flow as F.

Finally we are in a position to easily prove Lemma 3.7. Note that, by Lemma A.5 we have that |ΓS,T|=0 if and only if |Γ˜S,T|=0. By Lemma A.4 we have that |Γ˜S,T|=0 if and only if the max-flow from S to T in G˜flow equals |S|=k. Finally Lemma A.6 gives us that the max-flow from S to T in G˜flow equals the max-flow from S to T in Gflow. Hence we have that |ΓS,T|=0 if and only if the max-flow from S to T in Gflow equals k, this was our desired statement.

B Proof of Lemma 4.3

The proof of this lemma follows almost identically as the proof of Lemma 2 in [10]. We simply restate the arguments there in our setting. For any v,wV let H(v,w) be the set of half treks from v to w in G. Also let Tij be the set of all treks from si to tj in G which do not begin with an edge of the form sihki for any 1ki. Then it is easy to see that H(si,tj)Tij. Now, by the Trek Rule (Proposition 2.5), we have that

Aij=πTijπ(Λ,Ω).

Now for any system of treks Π define the monomial

Π(Λ,Ω)=πΠπ(Λ,Ω).

Then, by Leibniz’s formula for the determinant, we have that

|A|=Π(1)sign(Π)Π(Λ,Ω)

where the above sum is over all trek systems Π from S to T using treks only in the set 1i,jkTij; here the sign(Π) is the sign of the permutation that writes t1,,tk in the order of their appearance as targets of the treks in Π.

By assumption, there exists a half-trek system from S to T with no-sided intersection. Since such a system exists, let Π be a half-trek system of minimum total length among all such half-trek systems. Since H(si,tj)Tij for all i,j it follows that Π is included as one of the trek systems in the summation (15). Let Ψ be any system of treks from S to T such that Ψ(Λ,Ω)=Π(Λ,Ω). Lemma 1 from [10] proves that we must have Ψ=Π so that Π is the unique system of treks from S to T with corresponding trek monomial Π(Λ,Ω). It thus follows that the coefficient of the monomial Π(Λ,Ω) in |A| is (1)sign(Π) and thus |A| is not the zero polynomial (or power series if the sum is infinite). Hence, for generic choices of (Λ,Ω), we have that |A|/=0 so that A is generically invertible.

References

  • 1

    Bollen, KA. Structural equations with latent variables, Wiley Series in Probability and Mathematical Statistics: Applied Probability and Statistics. John Wiley & Sons, Inc., New York, a Wiley-Interscience Publication, 1989.

  • 2

    Pearl J. Causality models, reasoning, and inference, 2nd edn. Cambridge: Cambridge University Press, 2009.

  • 3

    Spirtes P, Glymour C, Scheines R. Causation, prediction, and search, 2nd edn. Cambridge, MA: MIT press, 2000.

  • 4

    Wright S. Correlation and causation. J. Agricultural Res. 921;20:557–585.

  • 5

    Wright S. The method of path coefficients. Ann. Math. Statist. 1934;5:161–215.

  • 6

    Didelez V, Meng S, Sheehan NA. Assumptions of IV methods for observational epidemiology. Statist. Sci. 2010;25:22–40.

  • 7

    Drton M, Foygel R, Sullivant S. Global identifiability of linear structural equation models. Ann. Statist. 2011;39:865–886.

  • 8

    Shpitser I, Pearl J. Identification of joint interventional distributions in recursive semi-Markovian causal models. In: Proceedings of the 21st National Conference on Artificial Intelligence. Menlo Park, CA: AAAI Press, 2006:1219–1226.

  • 9

    Tian J, Pearl J. A general identification condition for causal effects. In: Proceedins of the 18th national conference on artificial intelligence. Menlo Park, CA: AAAI Press, 2002:567–573.

  • 10

    Foygel R, Draisma J, Drton M. Half-trek criterion for generic identifiability of linear structural equation models. Ann. Statist. 2012a;40:1682–1713.

  • 11

    Chen, B. Decomposition and identification of linear structural equation models. ArXiv e-prints, 1508.01834, 2015.

  • 12

    Chen, B, Tian J, Pearl J. Testable implications of linear structural equations models. In: Brodley CE, Stone P, editors. Proceedings of the twenty-eighth AAAI conference on artificial intelligence. AAAI Press, 2014:2424–2430.

  • 13

    Drton M, Weihs L. Generic identifiability of linear structural equation models by ancestor decomposition. Scandinavian J Stat. 2016;43:1035–1045.

  • 14

    Sullivant S, Talaska K, Draisma J. Trek separation for Gaussian graphical models. Ann. Statist. 2010;38:1665–1685.

  • 15

    Brito, C, Pearl J. Generalized instrumental variables. In: Proceedings of the eighteenth conference annual conference on uncertainty in artificial intelligence (UAI-02). San Francisco, CA: Morgan Kaufmann, 2002:85–93.

  • 16

    Chen, B. Identification and overidentification of linear structural equation models. In: Lee DD, Sugiyama M, Luxburg UV, Guyon I, Garnett R, editors. Advances in neural information processing systems 29. Curran Associates, Inc., 2016: 1579–1587.

  • 17

    Drton M. Algebraic problems in structural equation modeling. ArXiv:1612.05994, 2016.

  • 18

    Okamoto M. Distinctness of the eigenvalues of a quadratic form in a multivariate sample. Ann. Statist. 1973;1:763–765.

  • 19

    Draisma, J, Sullivant S, Talaska K. Positivity for Gaussian graphical models. Adv. in Appl. Math. 2013;50:661–674.

  • 20

    Cormen, TH, Leiserson CE, Rivest RL, Stein C. Introduction to algorithms, 3rd ed. MIT Press, Cambridge, MA, 2009.

  • 21

    van der Zander B, Textor J, Liśkiewicz M. Efficiently finding conditional instruments for causal inference. In: Proceedings of the 24th international joint conference on artificial intelligence (IJCAI 2015). AAAI Press, 2015:3243–3249.

  • 22

    Foygel R, Drton M. SEMID: Identifiability of linear structural equation models, r package version 0.1, 2013.

  • 23

    R Core Team. R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing, 2014.

  • 24

    Foygel R, Draisma J, Drton M. Supplement to half-trek criterion for generic identifiability of linear structural equation models. Ann. Statist. 2012b;40.

  • 25

    van der Zander B, Liśkiewicz M. On searching for generalized instrumental variables. In: Proceedings of the 19th international conference on artificial intelligence and statistics (AISTATS 2016), JMLR Proceedings, 2016:1214–1222.

  • 26

    Fomin S. Loop-erased walks and total positivity. Trans. Amer. Math. Soc. 2001;353:3563–3583(electronic).

Footnotes

1

See https://github.com/Lucaweihs/SEMID.

If the inline PDF is not rendering correctly, you can download the PDF file here.

  • 1

    Bollen, KA. Structural equations with latent variables, Wiley Series in Probability and Mathematical Statistics: Applied Probability and Statistics. John Wiley & Sons, Inc., New York, a Wiley-Interscience Publication, 1989.

  • 2

    Pearl J. Causality models, reasoning, and inference, 2nd edn. Cambridge: Cambridge University Press, 2009.

  • 3

    Spirtes P, Glymour C, Scheines R. Causation, prediction, and search, 2nd edn. Cambridge, MA: MIT press, 2000.

  • 4

    Wright S. Correlation and causation. J. Agricultural Res. 921;20:557–585.

  • 5

    Wright S. The method of path coefficients. Ann. Math. Statist. 1934;5:161–215.

  • 6

    Didelez V, Meng S, Sheehan NA. Assumptions of IV methods for observational epidemiology. Statist. Sci. 2010;25:22–40.

  • 7

    Drton M, Foygel R, Sullivant S. Global identifiability of linear structural equation models. Ann. Statist. 2011;39:865–886.

  • 8

    Shpitser I, Pearl J. Identification of joint interventional distributions in recursive semi-Markovian causal models. In: Proceedings of the 21st National Conference on Artificial Intelligence. Menlo Park, CA: AAAI Press, 2006:1219–1226.

  • 9

    Tian J, Pearl J. A general identification condition for causal effects. In: Proceedins of the 18th national conference on artificial intelligence. Menlo Park, CA: AAAI Press, 2002:567–573.

  • 10

    Foygel R, Draisma J, Drton M. Half-trek criterion for generic identifiability of linear structural equation models. Ann. Statist. 2012a;40:1682–1713.

  • 11

    Chen, B. Decomposition and identification of linear structural equation models. ArXiv e-prints, 1508.01834, 2015.

  • 12

    Chen, B, Tian J, Pearl J. Testable implications of linear structural equations models. In: Brodley CE, Stone P, editors. Proceedings of the twenty-eighth AAAI conference on artificial intelligence. AAAI Press, 2014:2424–2430.

  • 13

    Drton M, Weihs L. Generic identifiability of linear structural equation models by ancestor decomposition. Scandinavian J Stat. 2016;43:1035–1045.

  • 14

    Sullivant S, Talaska K, Draisma J. Trek separation for Gaussian graphical models. Ann. Statist. 2010;38:1665–1685.

  • 15

    Brito, C, Pearl J. Generalized instrumental variables. In: Proceedings of the eighteenth conference annual conference on uncertainty in artificial intelligence (UAI-02). San Francisco, CA: Morgan Kaufmann, 2002:85–93.

  • 16

    Chen, B. Identification and overidentification of linear structural equation models. In: Lee DD, Sugiyama M, Luxburg UV, Guyon I, Garnett R, editors. Advances in neural information processing systems 29. Curran Associates, Inc., 2016: 1579–1587.

  • 17

    Drton M. Algebraic problems in structural equation modeling. ArXiv:1612.05994, 2016.

  • 18

    Okamoto M. Distinctness of the eigenvalues of a quadratic form in a multivariate sample. Ann. Statist. 1973;1:763–765.

  • 19

    Draisma, J, Sullivant S, Talaska K. Positivity for Gaussian graphical models. Adv. in Appl. Math. 2013;50:661–674.

  • 20

    Cormen, TH, Leiserson CE, Rivest RL, Stein C. Introduction to algorithms, 3rd ed. MIT Press, Cambridge, MA, 2009.

  • 21

    van der Zander B, Textor J, Liśkiewicz M. Efficiently finding conditional instruments for causal inference. In: Proceedings of the 24th international joint conference on artificial intelligence (IJCAI 2015). AAAI Press, 2015:3243–3249.

  • 22

    Foygel R, Drton M. SEMID: Identifiability of linear structural equation models, r package version 0.1, 2013.

  • 23

    R Core Team. R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing, 2014.

  • 24

    Foygel R, Draisma J, Drton M. Supplement to half-trek criterion for generic identifiability of linear structural equation models. Ann. Statist. 2012b;40.

  • 25

    van der Zander B, Liśkiewicz M. On searching for generalized instrumental variables. In: Proceedings of the 19th international conference on artificial intelligence and statistics (AISTATS 2016), JMLR Proceedings, 2016:1214–1222.

  • 26

    Fomin S. Loop-erased walks and total positivity. Trans. Amer. Math. Soc. 2001;353:3563–3583(electronic).

OPEN ACCESS

Journal + Issues

Journal of Causal Inference ( JCI) publishes papers on theoretical and applied causal research across the range of academic disciplines that use quantitative tools to study causality.

Search

  • View in gallery

    The mixed graph for the instrumental variable model.

  • View in gallery

    (a) A graph G that is generically identifiable but for which the HTC fails to identify any coefficients. (b) The corresponding flow graph Gflow, black edges correspond to (5), red edges to (6), and blue edges to (4) and (7).

  • View in gallery

    A graph for which Theorem 3.8 can be used to to certify that the edge λ12 is identifiable when Theorem 3.5 cannot.

  • View in gallery

    A graph that serves to illustrate differences between Theorem 1 of [16] and our Theorem 4.1.

  • View in gallery

    Two graphs serving to illustrate Theorem 5.2. (a) A graph in which all directed edges are identifiable except 2 3. The 2 3 edge can be shown to be infinite-to-one using Theorem 5.2. (b) A graph known to have generically infinite-to-one parameterization by Theorem 5.1 but for which Theorem 5.2 applies to no edge.

  • View in gallery

    Two graphs for which the EID+TSID algorithm is inconclusive. (a) is acyclic while (b) contains a cycle.

  • View in gallery

    A graph where the edges 46 and 56 can be simultaneously proven to be generically identifiable by solving a 2×2 linear system of determinantal equations.