## Introduction

In a *linear structural equation model* (L-SEM) the joint distribution of a random vector

where

where

Following an approach that dates back to Wright [4, 5], we may view *mixed graphs*, that is, graphs with both directed edges, *path diagrams*.

The mixed graph in Figure 1 corresponds to the well-known instrumental variable model [6]. In equations, this model asserts that

where

In this model, the random vector

A first question that arises when specifying an L-SEM via a mixed graph *globally identifiable*. Whether or not global identifiability holds can be decided in polynomial time [7, 8, 9]. However, in many cases global identifiability is too strong a condition. Indeed, the canonical instrumental variables model is not globally identifiable.

We will be instead interested in *generic identifiability*, that is, whether

We extend the applicability of the HTC in two ways. First, we show how the theorems on trek separation in [14] can be used to discover determinantal relations that in turn can be used to prove the generic identifiability of individual edge coefficients in L-SEMs. This method generalizes the use of conditional independence in known instrumental variable techniques; compare e.g. [15]. Once we have shown that individual edges are generically identifiable with this new method, it would be ideal if identified edges could be integrated into the equation systems discovered by the HTC to prove that even more edges are generically identifiable. Unfortunately, the HTC is not well suited to integrate single edge identifications as it operates simultaneously on all edges incoming to a given node. Our second contribution resolves this issue by providing an *edgewise* half-trek criterion which operates on subsets of a node’s parents, rather than all parents at once. This edgewise criterion often identifies many more coefficients than the usual HTC. We note that, in the process of preparing this manuscript we discovered independent work of Chen [16]; some of our results can be seen as a generalization of results in his work.

The rest of this paper is organized as follows. In Section 2, we give a brief overview of the necessary background on mixed graphs, L-SEMs, and the half-trek criterion. In Section 3, we show how trek-separation allows the generic identification of edge coefficients as quotients of subdeterminants. We introduce the edgewise half-trek criterion in Section 4 and we discuss necessary conditions for the generic identifiability of edge coefficients in Section 5. Computational experiments showing the applicability of our sufficient conditions follow in Section 6, and we finish with a brief conclusion in Section 7. Some longer proofs are deferred to the appendix.

## 2 Preliminaries

We assume some familiarity with the graphical representation of structural equation models and only give a brief overview of our objects of study. A more in-depth introduction can be found, for example, in [2] or, with a focus on the linear case considered here, in [17].

### 2.1 Mixed graphs and covariance matrices

Nonzero covariances in an L-SEM may arise through direct or through confounding effects. Mixed graphs with two types of edges have been used to represent these two sources of dependences.

A *mixed graph* on

Let *path* from *directed* if all of its edges are directed and point in the same direction, away from

(a) A path *source**target**trek* if it has no colliding arrowheads, that is,

where *top* node. Each trek

(b) A trek *half-trek* if

In particular, a half-trek from

Some terminology is needed to reference the local neighborhood structure of a vertex *parents* and the set of *descendents* of

respectively. The nodes incident to a bidirected edge can be thought of as having a common (latent) parent and thus we refer to the bidirected neighbors as *siblings* and define

Finally, we denote the sets of nodes that are *trek reachable* or *half-trek reachable* from

Two sets of matrices may be associated with a given mixed graph

with domain

Our focus is solely on covariance matrices. Indeed, in the traditional case where the errors

Subsequently, the matrices

Let *trek monomial* is defined as

If

[Trek Rule] The covariance matrix

### 2.2 Generic identifiability

We now formally introduce our problem of interest and review some of the prior work our results build on. We recall that an *algebraic set* is the zero-set of a collection of polynomials. An algebraic set that is a proper subset of Euclidean space has measure zero; see, e.g., the lemma in [18].

(a) The model given by a mixed graph *generically identifiable* if there exists a proper algebraic subset

for all

(b) Let *generically identifiable* if there exists a proper algebraic subset

In all examples we know of, if generic identifiability holds, then the parameters can in fact be recovered using rational formulas.

(a) A mixed graph *rationally identifiable* if there exists a rational map

(b) An edge *rationally identifiable* if there exists a rational function

We now introduce the half-trek criterion (HTC) from [10]. We generalize this criterion in Section 4.

Let *system of treks* from *system of half-treks*. A collection *no sided intersection* if

As our focus will be on the identification of individual edges in

A set of nodes *half-trek criterion* with respect to a vertex

,$|Y|=|\text{textrm{pa}}\left(v\right)|$ , and$Y\cap \left(\right\{v\}\cup \text{textrm{sib}}(v\left)\right)=\varnothing $ - there is a system of half-treks with no sided intersection from
to$Y$ .$\text{textrm {pa}}\left(v\right)$

[HTC-identifiability] Suppose that in the mixed graph

The sufficient condition for rational identifiability of

## 3 Trek separation and identification by ratios of determinants

Let

A pair of sets *t-separates* the sets

In this definition, the symbols

Let

Theorem 2.7 from [14] established this result for acyclic mixed graphs while [19] extended the result to all mixed graphs and even gave an explicit representation of the rational form of the subdeterminant

Let

Turn

Add vertices

Note that the maximum flow between vertex sets in a graph can be computed in polynomial time. Indeed, in our case, the conditions of Corollary 3.3 can be checked in

Consider the mixed graph

By the multilinearity of the determinant, we deduce that

Applying Corollary 3.3 a final time, we recognize that

generically and rationally identifies

In the above example, there is a correspondence between trek systems in

Let

,$\text{textrm {des}}\left(v\right)\cap (S\cup T\cup \{v\left\}\right)=\varnothing $ - the max-flow from
to$S$ in${T}^{\prime}\cup \left\{{{w}_{0}}^{\prime}\right\}$ equals${G}_{\text{flow}}$ , and$k$ - the max-flow from
to$S$ in${T}^{\prime}\cup \left\{{v}^{\prime}\right\}$ is smaller than${\stackrel{\u02c9}{G}}_{\text{flow}}$ .$k$

Then

Let

Now let

As the above argument holds for all

Using assumption (c) and applying Corollary 3.3, we have

Theorem 3.5 generalizes the ideas underlying instrumental variable methods such as those discussed in [15]. Indeed, this prior work uses d-separation as opposed to t-separation. D-separation characterizes conditional independence which in the present context corresponds to the vanishing of particular almost principal determinants of the covariance matrix. In contrast, Theorem 3.5 allows us to leverage arbitrary determinantal relations; compare [14]. The graph in Figure 2a is an example in which d-separation and traditional instrumental variable techniques cannot explain the rational identifiability of the coefficient for edge

While assumption (a) in the above Theorem allows for the easy application of Corollary 3.3, this assumption can be relaxed by generalizing one direction of Corollary 3.3. We state this generalization as the following lemma, which is concerned with asymmetric treatment of edges that appear on the left versus right-hand side of treks. The lemma’s proof is deferred to Appendix A.

Let

Define a network

with all edges and vertices having capacity 1. Let

We may now state our more general result.

Let

,$\text{{textrm des}}\left(v\right)\cap (T\cup \{v\left\}\right)=\varnothing $ - the max-flow from
to$S$ in${T}^{\prime}\cup \left\{{{w}_{0}}^{\prime}\right\}$ equals${G}_{\text{flow}}$ , and$k$ - the max-flow from
to$S$ in${T}^{\prime}\cup \left\{{v}^{\prime}\right\}$ is${G}_{\text{flow}}^{\ast}$ .$<k$

Then

By assumption (b) and Corollary 3.3,

To show this we note that, by the multilinearity of the determinant, we have

Write

Consider any two indices

where

Clearly Theorem 3.8 can be applied whenever Theorem 3.5 can. Moreover, as the next example shows, there are cases in which Theorem 3.8 can be used while Theorem 3.5 cannot.

Let

For a fixed choice of

In order to apply Theorem 3.8 algorithmically, however, we have to consider all possible subsets

## 4 Edgewise generic identifiability

While our results from Section 3 can be used together with the HTC there is notable lack of synergy between the two methods as Theorem 3.8 requires that all directed edges incoming to a node be generically identifiable before that node can be used to prove the generic identifiability of other edges. Aiming to strengthen the HTC while allowing it to better use identifications produced by Theorem 3.8, the following theorem establishes a sufficient condition for the generic identifiability of any set of incoming edges to a fixed node. While in the process of preparing this manuscript we discovered the work of Chen [16]; our following theorem can be seen as a generalization of his Theorem 1, see Remark 4.2 for a discussion of the primary difference between our theorem and that in [16].

Let

- [
*(i)*] there exists a half-trek system from to$Y$ with no sided intersection,$W$ - [
*(ii)*] for every trek from$\pi $ to$y\in Y$ we have that either$v$ ends with an edge of the form$\pi $ where either$s\to v$ or$s\in W$ is known to be generically (rationally) identifiable, or$s\to v$ begins with an edge of the form$\pi $ where$y\leftarrow s$ is known to be generically (rationally) identifiable.$s\to y$

Then for each

Let

Recalling that

Our approach is to build a linear system of

equals the sum of the monomials for all treks from

Now the sum over all treks between

Rewriting this we have

Notice that, in the above equation, if

Repeating the above argument for each of the

The invertibility of

Our Theorem 4.1 generalizes Theorem 1 in [16] in two ways. Firstly, we make the trivial, but for our purposes important, modification to formulate our theorem in a fashion that is agnostic as to how prior generic identifications were obtained. For the presentation in [16] it was more natural to focus only on such identifications being obtained from prior applications of his theorem. Secondly, and more substantially, the results in [16] do not consider the possibility that, recalling the setting of Theorem 4.1, some of the edges incoming to

As an example of how the above difference can appear in practice consider Figure 4 and suppose we have restricted the size of edge sets

The following lemma generalizes Lemma 2 from [10] and completes the proof of Theorem 4.1.

Let

is generically invertible.

The proof of this lemma is deferred to Appendix B. Note that if let

The conditions of Theorem 4.1 can be easily checked in polynomial time using max-flow computations, just as with the standard half-trek criterion. Unfortunately, in general, we do not know for which subset

Algorithm 1 Edgewise identification algorithm. | |
---|---|

1: | Input: A mixed graph |

2: | repeat |

3: | fordo |

4: | |

5: | |

6: | fordo |

7: | |

8: | |

9: | iftrue |

10: | |

11: | Break out of the current loop |

12: | end if |

13: | end for |

14: | end for |

15: | until No additional edges have been added to |

16: | Output: |

## 5 Edgewise generic nonidentifiability

In prior sections we have focused solely on sufficient conditions for demonstrating the generic identifiability of edges in a mixed graph. This, of course, begs the question of if there are any complementary necessary conditions. That is, if there exist conditions that, when failed, show that a given edge is generically many-to-one. To our knowledge, the following is the only known necessary condition for generic identifiability and considers all parameters of a mixed graph

(Theorem 2 of [10])Suppose

This theorem operates by showing that, given its conditions, the Jacobian of the map

Let

Let

Let

Whenever

Next suppose, without loss of generality, that

Now since

Let

Let

We stress, however, that Theorem 5.2 does not imply Theorem 5.1; that is, there are graphs

## 6 Computational experiments

In this section we will provide some computational experiments that demonstrate the usefulness of our theorems in extending the applicability of the half-trek criterion. All of our following experiments are carried out in the R programming language and the following algorithms are implemented in our R package SEMID which is available on CRAN, the Comprehensive R Archive Network [22, 23], as well as on GitHub.^{1} We will be considering four different identification algorithms for checking generic identifiability:

- The standard half-trek criterion (HTC) algorithm.
- The edgewise identification (EID) algorithm, displayed in Algorithm 1, where the input set of
is empty.$solved\phantom{\rule{0ex}{0ex}}Edges$ - The trek-separation identification (TSID) algorithm. Similarly as for Algorithm 1 this algorithm iteratively applies Theorem 3.8 until it fails to identify any additional edges. (Since we are considering a small number of nodes there is no need to limit the size of sets
and$S$ we are searching for in our computation.)$T$ - The EID
TSID algorithm. This algorithm alternates between the EID and TSID algorithms until it fails to identify any additional edges.$+$

We emphasize that when all of the directed edges, i.e., the matrix

In Table 1 from [24], the authors list all 112 acyclic non-isomorphic mixed graphs on 5 nodes which are generically identifiable but for which the half-trek criterion remains inconclusive even when using decomposition techniques. We run the EID, TSID, and EID

We observe a similar trend to the above when allowing cyclic mixed graphs. In Table 2 of [24], the authors list 75 randomly chosen, cyclic (i.e., containing a loop in the directed part), mixed graphs that are known to be rationally identifiable but cannot be certified so by the half-trek criterion. Of these 75 graphs, 4 are certified to be generically identifiable by the EID algorithm, 0 by the TSID algorithm, and 34 by the EID

A listing of the 14 acyclic and 41 cyclic mixed graphs that could not be identified by the EID

- For
, for$v\leftarrow 1,\dots ,n$ , do Add edge$w\leftarrow 1,\dots ,v-1,v+1,\dots ,n$ to$v\to w$ if$G$ Replace$d\text{mod\hspace{0.17em}}2=1$ with$d$ $\lfloor d/2\rfloor $ - For
, for$v\leftarrow 1,\dots ,n-1$ , do Add edge$w\leftarrow v+1,\dots ,n$ to$v\leftrightarrow w$ if$G$ Replace$b\text{mod\hspace{0.17em}}2=1$ with$b$ $\lfloor b/2\rfloor $

See Figure 6 for an example of a cyclic and acyclic graph that the EID

Of the 112 acyclic and 75 cyclic mixed graphs on 5 nodes described in Tables 1 and 2 from [24], we display the 12 acyclic and 41 cyclic graphs which are known to be generically identifiable but for which the EID

Acyclic | cCyclic | ||

(4456, 113) | (345, 440) | (6629, 512) | (75321, 516) |

(360, 117) | (71329, 18) | (74536, 788) | (75398, 20) |

(6275, 172) | (81089, 0) | (5545, 96) | (70803, 896) |

(6307, 172) | (4714, 41) | (75112, 72) | (4457, 592) |

(6275, 188) | (70881, 80) | (74970, 4) | (74883, 522) |

(360, 369) | (74963, 512) | (4579, 384) | (350, 112) |

(4696, 401) | (74886, 268) | (70594, 65) | (74883, 2) |

(4936, 401) | (5058, 304) | (74921, 66) | (74950, 260) |

(4936, 402) | (70821, 513) | (70474, 640) | (74890, 38) |

(4680, 403) | (74915, 6) | (74922, 66) | (81076, 0) |

(840, 466) | (5267, 82) | (13160, 65) | (70851, 32) |

(5257, 658) | (76852, 128) | (4938, 448) | (1430, 120) |

(5257, 659) | (71075, 516) | (4730, 640) | (5251, 418) |

(4680, 914) | (4397, 897) | (70358, 1) |

## 7 Conclusion

By exploiting the trek-separation characterization of the vanishing of subdeterminants of the covariance matrix

Our work on identification by ratios of determinants focuses on a single edge coefficient. However, it seems possible to give a generalization that is in the spirit of the generalized instrumental sets from [15]; see also [25]. These leverage several conditional independencies to find a linear equation system that can be used to identify several edge coefficients simultaneously, under specific assumptions on the interplay of the conditional independencies and the edges to be identified. We illustrate the idea of how to do this using general determinants in the following example. However, a full exploration of this idea is beyond the scope of this paper. In particular, we are still lacking mathematical tools that, in suitable generality, could be used to certify that constructed linear equation systems have a unique solution.

Let

Using computer algebra we find that the

This material is based on work started in June 2016 at the Mathematics Research Communities (Week on Algebraic Statistics). The work was supported by the National Science Foundation under Grant Number DMS 1321794 and 1712535.

A

We will require a known generalization of the Gessel-Viennot-Lindström lemma which we now state.

Let *loop erased* path

[Gessel-Viennot-Lindström Generalization, Theorem 6.1 from [26]] Let

here the above inner sum is over all directed path systems

The remaining proof of Lemma 3.7 proceeds in several parts and closely follows similar results in [14] and [19]. As such we will state several lemmas whose proofs require only small modifications of existing results (such as replacing the standard Gessel-Viennot-Lindström Lemma with its generalization above). In such cases we will simply direct the reader to the corresponding proof and sketch the necessary modifications.

Let *avoids * if the left (right) side of

*avoids*$U$ on the left (right)if every trek

*trek (or trek system) avoids*$({U}_{L},{U}_{R})$ if it avoids

Let

the max-flow from

In the following, whenever we say “As in x,” we mean “As in the proof of x in [14].”

As in Lemma 3.2, we have

Now noticing that

We have now proven our desired result in the case *bidirected subdivision* of

Let

This proof follows, essentially exactly, as the first part of the proof of Prop. 2.5 in [19].

Now we show that the above subdivision trick produces a graph

Consider the graphs

Recall that a flow system on a graph is an assignment of flow to the edges and vertices of the graph satisfying the usual flow constraints. Also recall that, for graphs with integral capacities, there always exists a max-flow system between subsets of nodes for which all flow assignments upon edges and vertices take values in

Let

We now construct a flow system

- –Case 1:
assigns 1 flow to$\tilde{F}$ .In this case assign a flow of 1 to the edge${v}_{{a}_{i}{{b}_{i}}^{\prime}}\to {{a}_{i}}^{\prime}$ in${a}_{i}\to {{a}_{i}}^{\prime}$ .$F$ - –Case 2:
assigns 1 flow to$\tilde{F}$ .In this case assign a flow of 1 to the edge${v}_{{a}_{i}{{b}_{i}}^{\prime}}\to {{b}_{i}}^{\prime}$ in${a}_{i}\to {{b}_{i}}^{\prime}$ .$F$

It is easy to check that

To see the oppose direction let

- –Case 1:
and${a}_{i}\to {{b}_{i}}^{\prime}\in E$ .In this case assign a flow of 1 along the path${b}_{i}\to {a}_{i}/\in E$ in${a}_{i}\to {v}_{{a}_{i}{b}_{i}}\to {{b}_{i}}^{\prime}$ .$\tilde{F}$ - –Case 2:
and${a}_{i}\to {{b}_{i}}^{\prime}\in E$ .In this case assign a flow of 1 to the edges${b}_{i}\to {a}_{i}\in E$ and${a}_{i}\to {{a}_{i}}^{\prime}$ in${b}_{i}\to {{b}_{i}}^{\prime}$ .$\tilde{F}$

One may now check that

Finally we are in a position to easily prove Lemma 3.7. Note that, by Lemma A.5 we have that

## B Proof of Lemma 4.3

The proof of this lemma follows almost identically as the proof of Lemma 2 in [10]. We simply restate the arguments there in our setting. For any

Now for any system of treks

Then, by Leibniz’s formula for the determinant, we have that

where the above sum is over all trek systems

By assumption, there exists a half-trek system from

## References

- 1↑
Bollen, KA. Structural equations with latent variables, Wiley Series in Probability and Mathematical Statistics: Applied Probability and Statistics. John Wiley & Sons, Inc., New York, a Wiley-Interscience Publication, 1989.

- 2↑
Pearl J. Causality models, reasoning, and inference, 2nd edn. Cambridge: Cambridge University Press, 2009.

- 3↑
Spirtes P, Glymour C, Scheines R. Causation, prediction, and search, 2nd edn. Cambridge, MA: MIT press, 2000.

- 6↑
Didelez V, Meng S, Sheehan NA. Assumptions of IV methods for observational epidemiology. Statist. Sci. 2010;25:22–40.

- 7↑
Drton M, Foygel R, Sullivant S. Global identifiability of linear structural equation models. Ann. Statist. 2011;39:865–886.

- 8↑
Shpitser I, Pearl J. Identification of joint interventional distributions in recursive semi-Markovian causal models. In: Proceedings of the 21st National Conference on Artificial Intelligence. Menlo Park, CA: AAAI Press, 2006:1219–1226.

- 9↑
Tian J, Pearl J. A general identification condition for causal effects. In: Proceedins of the 18th national conference on artificial intelligence. Menlo Park, CA: AAAI Press, 2002:567–573.

- 10↑
Foygel R, Draisma J, Drton M. Half-trek criterion for generic identifiability of linear structural equation models. Ann. Statist. 2012a;40:1682–1713.

- 11↑
Chen, B. Decomposition and identification of linear structural equation models. ArXiv e-prints, 1508.01834, 2015.

- 12↑
Chen, B, Tian J, Pearl J. Testable implications of linear structural equations models. In: Brodley CE, Stone P, editors. Proceedings of the twenty-eighth AAAI conference on artificial intelligence. AAAI Press, 2014:2424–2430.

- 13↑
Drton M, Weihs L. Generic identifiability of linear structural equation models by ancestor decomposition. Scandinavian J Stat. 2016;43:1035–1045.

- 14↑
Sullivant S, Talaska K, Draisma J. Trek separation for Gaussian graphical models. Ann. Statist. 2010;38:1665–1685.

- 15↑
Brito, C, Pearl J. Generalized instrumental variables. In: Proceedings of the eighteenth conference annual conference on uncertainty in artificial intelligence (UAI-02). San Francisco, CA: Morgan Kaufmann, 2002:85–93.

- 16↑
Chen, B. Identification and overidentification of linear structural equation models. In: Lee DD, Sugiyama M, Luxburg UV, Guyon I, Garnett R, editors. Advances in neural information processing systems 29. Curran Associates, Inc., 2016: 1579–1587.

- 18↑
Okamoto M. Distinctness of the eigenvalues of a quadratic form in a multivariate sample. Ann. Statist. 1973;1:763–765.

- 19↑
Draisma, J, Sullivant S, Talaska K. Positivity for Gaussian graphical models. Adv. in Appl. Math. 2013;50:661–674.

- 20↑
Cormen, TH, Leiserson CE, Rivest RL, Stein C. Introduction to algorithms, 3rd ed. MIT Press, Cambridge, MA, 2009.

- 21↑
van der Zander B, Textor J, Liśkiewicz M. Efficiently finding conditional instruments for causal inference. In: Proceedings of the 24th international joint conference on artificial intelligence (IJCAI 2015). AAAI Press, 2015:3243–3249.

- 22↑
Foygel R, Drton M. SEMID: Identifiability of linear structural equation models, r package version 0.1, 2013.

- 23↑
R Core Team. R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing, 2014.

- 24↑
Foygel R, Draisma J, Drton M. Supplement to half-trek criterion for generic identifiability of linear structural equation models. Ann. Statist. 2012b;40.

- 25↑
van der Zander B, Liśkiewicz M. On searching for generalized instrumental variables. In: Proceedings of the 19th international conference on artificial intelligence and statistics (AISTATS 2016), JMLR Proceedings, 2016:1214–1222.

- 26↑
Fomin S. Loop-erased walks and total positivity. Trans. Amer. Math. Soc. 2001;353:3563–3583(electronic).

## Footnotes

^{1}

See https://github.com/Lucaweihs/SEMID.