In a linear structural equation model (L-SEM) the joint distribution of a random vector
Following an approach that dates back to Wright [4, 5], we may view
In this model, the random vector
A first question that arises when specifying an L-SEM via a mixed graph
We will be instead interested in generic identifiability, that is, whether
We extend the applicability of the HTC in two ways. First, we show how the theorems on trek separation in  can be used to discover determinantal relations that in turn can be used to prove the generic identifiability of individual edge coefficients in L-SEMs. This method generalizes the use of conditional independence in known instrumental variable techniques; compare e.g. . Once we have shown that individual edges are generically identifiable with this new method, it would be ideal if identified edges could be integrated into the equation systems discovered by the HTC to prove that even more edges are generically identifiable. Unfortunately, the HTC is not well suited to integrate single edge identifications as it operates simultaneously on all edges incoming to a given node. Our second contribution resolves this issue by providing an edgewise half-trek criterion which operates on subsets of a node’s parents, rather than all parents at once. This edgewise criterion often identifies many more coefficients than the usual HTC. We note that, in the process of preparing this manuscript we discovered independent work of Chen ; some of our results can be seen as a generalization of results in his work.
The rest of this paper is organized as follows. In Section 2, we give a brief overview of the necessary background on mixed graphs, L-SEMs, and the half-trek criterion. In Section 3, we show how trek-separation allows the generic identification of edge coefficients as quotients of subdeterminants. We introduce the edgewise half-trek criterion in Section 4 and we discuss necessary conditions for the generic identifiability of edge coefficients in Section 5. Computational experiments showing the applicability of our sufficient conditions follow in Section 6, and we finish with a brief conclusion in Section 7. Some longer proofs are deferred to the appendix.
We assume some familiarity with the graphical representation of structural equation models and only give a brief overview of our objects of study. A more in-depth introduction can be found, for example, in  or, with a focus on the linear case considered here, in .
2.1 Mixed graphs and covariance matrices
Nonzero covariances in an L-SEM may arise through direct or through confounding effects. Mixed graphs with two types of edges have been used to represent these two sources of dependences.
A mixed graph on
(a) A path
(b) A trek
In particular, a half-trek from
Some terminology is needed to reference the local neighborhood structure of a vertex
respectively. The nodes incident to a bidirected edge can be thought of as having a common (latent) parent and thus we refer to the bidirected neighbors as siblings and define
Finally, we denote the sets of nodes that are trek reachable or half-trek reachable from
Two sets of matrices may be associated with a given mixed graph
Our focus is solely on covariance matrices. Indeed, in the traditional case where the errors
Subsequently, the matrices
[Trek Rule] The covariance matrix
2.2 Generic identifiability
We now formally introduce our problem of interest and review some of the prior work our results build on. We recall that an algebraic set is the zero-set of a collection of polynomials. An algebraic set that is a proper subset of Euclidean space has measure zero; see, e.g., the lemma in .
(a) The model given by a mixed graph
In all examples we know of, if generic identifiability holds, then the parameters can in fact be recovered using rational formulas.
(a) A mixed graph
(b) An edge
As our focus will be on the identification of individual edges in
A set of nodes
, , and
- there is a system of half-treks with no sided intersection from
[HTC-identifiability] Suppose that in the mixed graph
The sufficient condition for rational identifiability of
3 Trek separation and identification by ratios of determinants
A pair of sets
In this definition, the symbols
Theorem 2.7 from  established this result for acyclic mixed graphs while  extended the result to all mixed graphs and even gave an explicit representation of the rational form of the subdeterminant
Note that the maximum flow between vertex sets in a graph can be computed in polynomial time. Indeed, in our case, the conditions of Corollary 3.3 can be checked in
Consider the mixed graph
By the multilinearity of the determinant, we deduce that
Applying Corollary 3.3 a final time, we recognize that
generically and rationally identifies
In the above example, there is a correspondence between trek systems in
- the max-flow from
to in equals , and
- the max-flow from
to in is smaller than .
As the above argument holds for all
Using assumption (c) and applying Corollary 3.3, we have
Theorem 3.5 generalizes the ideas underlying instrumental variable methods such as those discussed in . Indeed, this prior work uses d-separation as opposed to t-separation. D-separation characterizes conditional independence which in the present context corresponds to the vanishing of particular almost principal determinants of the covariance matrix. In contrast, Theorem 3.5 allows us to leverage arbitrary determinantal relations; compare . The graph in Figure 2a is an example in which d-separation and traditional instrumental variable techniques cannot explain the rational identifiability of the coefficient for edge
While assumption (a) in the above Theorem allows for the easy application of Corollary 3.3, this assumption can be relaxed by generalizing one direction of Corollary 3.3. We state this generalization as the following lemma, which is concerned with asymmetric treatment of edges that appear on the left versus right-hand side of treks. The lemma’s proof is deferred to Appendix A.
Define a network
with all edges and vertices having capacity 1. Let
We may now state our more general result.
- the max-flow from
to in equals , and
- the max-flow from
to in is .
By assumption (b) and Corollary 3.3,
To show this we note that, by the multilinearity of the determinant, we have
Consider any two indices
Clearly Theorem 3.8 can be applied whenever Theorem 3.5 can. Moreover, as the next example shows, there are cases in which Theorem 3.8 can be used while Theorem 3.5 cannot.
For a fixed choice of
In order to apply Theorem 3.8 algorithmically, however, we have to consider all possible subsets
4 Edgewise generic identifiability
While our results from Section 3 can be used together with the HTC there is notable lack of synergy between the two methods as Theorem 3.8 requires that all directed edges incoming to a node be generically identifiable before that node can be used to prove the generic identifiability of other edges. Aiming to strengthen the HTC while allowing it to better use identifications produced by Theorem 3.8, the following theorem establishes a sufficient condition for the generic identifiability of any set of incoming edges to a fixed node. While in the process of preparing this manuscript we discovered the work of Chen ; our following theorem can be seen as a generalization of his Theorem 1, see Remark 4.2 for a discussion of the primary difference between our theorem and that in .
- [(i)] there exists a half-trek system from
to with no sided intersection,
- [(ii)] for every trek
from to we have that either ends with an edge of the form where either or is known to be generically (rationally) identifiable, or begins with an edge of the form where is known to be generically (rationally) identifiable.
Then for each
Our approach is to build a linear system of
equals the sum of the monomials for all treks from
Now the sum over all treks between
Rewriting this we have
Notice that, in the above equation, if
Repeating the above argument for each of the
The invertibility of
Our Theorem 4.1 generalizes Theorem 1 in  in two ways. Firstly, we make the trivial, but for our purposes important, modification to formulate our theorem in a fashion that is agnostic as to how prior generic identifications were obtained. For the presentation in  it was more natural to focus only on such identifications being obtained from prior applications of his theorem. Secondly, and more substantially, the results in  do not consider the possibility that, recalling the setting of Theorem 4.1, some of the edges incoming to
As an example of how the above difference can appear in practice consider Figure 4 and suppose we have restricted the size of edge sets
The following lemma generalizes Lemma 2 from  and completes the proof of Theorem 4.1.
is generically invertible.
The proof of this lemma is deferred to Appendix B. Note that if let
The conditions of Theorem 4.1 can be easily checked in polynomial time using max-flow computations, just as with the standard half-trek criterion. Unfortunately, in general, we do not know for which subset
|Algorithm 1 Edgewise identification algorithm.|
|1:||Input: A mixed graph |
|11:||Break out of the current loop|
|15:||until No additional edges have been added to |
5 Edgewise generic nonidentifiability
In prior sections we have focused solely on sufficient conditions for demonstrating the generic identifiability of edges in a mixed graph. This, of course, begs the question of if there are any complementary necessary conditions. That is, if there exist conditions that, when failed, show that a given edge is generically many-to-one. To our knowledge, the following is the only known necessary condition for generic identifiability and considers all parameters of a mixed graph
(Theorem 2 of )Suppose
This theorem operates by showing that, given its conditions, the Jacobian of the map
Next suppose, without loss of generality, that
We stress, however, that Theorem 5.2 does not imply Theorem 5.1; that is, there are graphs
6 Computational experiments
In this section we will provide some computational experiments that demonstrate the usefulness of our theorems in extending the applicability of the half-trek criterion. All of our following experiments are carried out in the R programming language and the following algorithms are implemented in our R package SEMID which is available on CRAN, the Comprehensive R Archive Network [22, 23], as well as on GitHub.1 We will be considering four different identification algorithms for checking generic identifiability:
- The standard half-trek criterion (HTC) algorithm.
- The edgewise identification (EID) algorithm, displayed in Algorithm 1, where the input set of
- The trek-separation identification (TSID) algorithm. Similarly as for Algorithm 1 this algorithm iteratively applies Theorem 3.8 until it fails to identify any additional edges. (Since we are considering a small number of nodes there is no need to limit the size of sets
and we are searching for in our computation.)
- The EID
TSID algorithm. This algorithm alternates between the EID and TSID algorithms until it fails to identify any additional edges.
We emphasize that when all of the directed edges, i.e., the matrix
In Table 1 from , the authors list all 112 acyclic non-isomorphic mixed graphs on 5 nodes which are generically identifiable but for which the half-trek criterion remains inconclusive even when using decomposition techniques. We run the EID, TSID, and EID
We observe a similar trend to the above when allowing cyclic mixed graphs. In Table 2 of , the authors list 75 randomly chosen, cyclic (i.e., containing a loop in the directed part), mixed graphs that are known to be rationally identifiable but cannot be certified so by the half-trek criterion. Of these 75 graphs, 4 are certified to be generically identifiable by the EID algorithm, 0 by the TSID algorithm, and 34 by the EID
A listing of the 14 acyclic and 41 cyclic mixed graphs that could not be identified by the EID
, for , do Add edge to if Replace with
, for , do Add edge to if Replace with
See Figure 6 for an example of a cyclic and acyclic graph that the EID
Of the 112 acyclic and 75 cyclic mixed graphs on 5 nodes described in Tables 1 and 2 from , we display the 12 acyclic and 41 cyclic graphs which are known to be generically identifiable but for which the EID
|(4456, 113)||(345, 440)||(6629, 512)||(75321, 516)|
|(360, 117)||(71329, 18)||(74536, 788)||(75398, 20)|
|(6275, 172)||(81089, 0)||(5545, 96)||(70803, 896)|
|(6307, 172)||(4714, 41)||(75112, 72)||(4457, 592)|
|(6275, 188)||(70881, 80)||(74970, 4)||(74883, 522)|
|(360, 369)||(74963, 512)||(4579, 384)||(350, 112)|
|(4696, 401)||(74886, 268)||(70594, 65)||(74883, 2)|
|(4936, 401)||(5058, 304)||(74921, 66)||(74950, 260)|
|(4936, 402)||(70821, 513)||(70474, 640)||(74890, 38)|
|(4680, 403)||(74915, 6)||(74922, 66)||(81076, 0)|
|(840, 466)||(5267, 82)||(13160, 65)||(70851, 32)|
|(5257, 658)||(76852, 128)||(4938, 448)||(1430, 120)|
|(5257, 659)||(71075, 516)||(4730, 640)||(5251, 418)|
|(4680, 914)||(4397, 897)||(70358, 1)|
By exploiting the trek-separation characterization of the vanishing of subdeterminants of the covariance matrix
Our work on identification by ratios of determinants focuses on a single edge coefficient. However, it seems possible to give a generalization that is in the spirit of the generalized instrumental sets from ; see also . These leverage several conditional independencies to find a linear equation system that can be used to identify several edge coefficients simultaneously, under specific assumptions on the interplay of the conditional independencies and the edges to be identified. We illustrate the idea of how to do this using general determinants in the following example. However, a full exploration of this idea is beyond the scope of this paper. In particular, we are still lacking mathematical tools that, in suitable generality, could be used to certify that constructed linear equation systems have a unique solution.
Using computer algebra we find that the
This material is based on work started in June 2016 at the Mathematics Research Communities (Week on Algebraic Statistics). The work was supported by the National Science Foundation under Grant Number DMS 1321794 and 1712535.
We will require a known generalization of the Gessel-Viennot-Lindström lemma which we now state.
[Gessel-Viennot-Lindström Generalization, Theorem 6.1 from ] Let
here the above inner sum is over all directed path systems
The remaining proof of Lemma 3.7 proceeds in several parts and closely follows similar results in  and . As such we will state several lemmas whose proofs require only small modifications of existing results (such as replacing the standard Gessel-Viennot-Lindström Lemma with its generalization above). In such cases we will simply direct the reader to the corresponding proof and sketch the necessary modifications.
the max-flow from
In the following, whenever we say “As in x,” we mean “As in the proof of x in .”
As in Lemma 3.2, we have
Now noticing that
We have now proven our desired result in the case
This proof follows, essentially exactly, as the first part of the proof of Prop. 2.5 in .
Now we show that the above subdivision trick produces a graph
Consider the graphs
Recall that a flow system on a graph is an assignment of flow to the edges and vertices of the graph satisfying the usual flow constraints. Also recall that, for graphs with integral capacities, there always exists a max-flow system between subsets of nodes for which all flow assignments upon edges and vertices take values in
We now construct a flow system
- –Case 1:
assigns 1 flow to .In this case assign a flow of 1 to the edge in .
- –Case 2:
assigns 1 flow to .In this case assign a flow of 1 to the edge in .
It is easy to check that
To see the oppose direction let
- –Case 1:
and .In this case assign a flow of 1 along the path in .
- –Case 2:
and .In this case assign a flow of 1 to the edges and in .
One may now check that
Finally we are in a position to easily prove Lemma 3.7. Note that, by Lemma A.5 we have that
B Proof of Lemma 4.3
The proof of this lemma follows almost identically as the proof of Lemma 2 in . We simply restate the arguments there in our setting. For any
Now for any system of treks
Then, by Leibniz’s formula for the determinant, we have that
where the above sum is over all trek systems
By assumption, there exists a half-trek system from
Bollen, KA. Structural equations with latent variables, Wiley Series in Probability and Mathematical Statistics: Applied Probability and Statistics. John Wiley & Sons, Inc., New York, a Wiley-Interscience Publication, 1989.
Pearl J. Causality models, reasoning, and inference, 2nd edn. Cambridge: Cambridge University Press, 2009.
Spirtes P, Glymour C, Scheines R. Causation, prediction, and search, 2nd edn. Cambridge, MA: MIT press, 2000.
Wright S. Correlation and causation. J. Agricultural Res. 921;20:557–585.
Wright S. The method of path coefficients. Ann. Math. Statist. 1934;5:161–215.
Didelez V, Meng S, Sheehan NA. Assumptions of IV methods for observational epidemiology. Statist. Sci. 2010;25:22–40.
Drton M, Foygel R, Sullivant S. Global identifiability of linear structural equation models. Ann. Statist. 2011;39:865–886.
Shpitser I, Pearl J. Identification of joint interventional distributions in recursive semi-Markovian causal models. In: Proceedings of the 21st National Conference on Artificial Intelligence. Menlo Park, CA: AAAI Press, 2006:1219–1226.
Tian J, Pearl J. A general identification condition for causal effects. In: Proceedins of the 18th national conference on artificial intelligence. Menlo Park, CA: AAAI Press, 2002:567–573.
Foygel R, Draisma J, Drton M. Half-trek criterion for generic identifiability of linear structural equation models. Ann. Statist. 2012a;40:1682–1713.
Chen, B. Decomposition and identification of linear structural equation models. ArXiv e-prints, 1508.01834, 2015.
Chen, B, Tian J, Pearl J. Testable implications of linear structural equations models. In: Brodley CE, Stone P, editors. Proceedings of the twenty-eighth AAAI conference on artificial intelligence. AAAI Press, 2014:2424–2430.
Drton M, Weihs L. Generic identifiability of linear structural equation models by ancestor decomposition. Scandinavian J Stat. 2016;43:1035–1045.
Sullivant S, Talaska K, Draisma J. Trek separation for Gaussian graphical models. Ann. Statist. 2010;38:1665–1685.
Brito, C, Pearl J. Generalized instrumental variables. In: Proceedings of the eighteenth conference annual conference on uncertainty in artificial intelligence (UAI-02). San Francisco, CA: Morgan Kaufmann, 2002:85–93.
Chen, B. Identification and overidentification of linear structural equation models. In: Lee DD, Sugiyama M, Luxburg UV, Guyon I, Garnett R, editors. Advances in neural information processing systems 29. Curran Associates, Inc., 2016: 1579–1587.
Drton M. Algebraic problems in structural equation modeling. ArXiv:1612.05994, 2016.
Okamoto M. Distinctness of the eigenvalues of a quadratic form in a multivariate sample. Ann. Statist. 1973;1:763–765.
Draisma, J, Sullivant S, Talaska K. Positivity for Gaussian graphical models. Adv. in Appl. Math. 2013;50:661–674.
Cormen, TH, Leiserson CE, Rivest RL, Stein C. Introduction to algorithms, 3rd ed. MIT Press, Cambridge, MA, 2009.
van der Zander B, Textor J, Liśkiewicz M. Efficiently finding conditional instruments for causal inference. In: Proceedings of the 24th international joint conference on artificial intelligence (IJCAI 2015). AAAI Press, 2015:3243–3249.
Foygel R, Drton M. SEMID: Identifiability of linear structural equation models, r package version 0.1, 2013.
R Core Team. R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing, 2014.
Foygel R, Draisma J, Drton M. Supplement to half-trek criterion for generic identifiability of linear structural equation models. Ann. Statist. 2012b;40.
van der Zander B, Liśkiewicz M. On searching for generalized instrumental variables. In: Proceedings of the 19th international conference on artificial intelligence and statistics (AISTATS 2016), JMLR Proceedings, 2016:1214–1222.
Fomin S. Loop-erased walks and total positivity. Trans. Amer. Math. Soc. 2001;353:3563–3583(electronic).