Topology of the mesoscale connectome of the mouse brain

The wiring diagram of the mouse brain has recently been mapped at a mesoscopic scale in the Allen Mouse Brain Connectivity Atlas. Axonal projections from brain regions were traced using green fluoresent proteins. The resulting data were registered to a common three-dimensional reference space. They yielded a matrix of connection strengths between 213 brain regions. Global features such as closed loops formed by connections of similar intensity can be inferred using tools from persistent homology. We map the wiring diagram of the mouse brain to a simplicial complex (filtered by connection strengths). We work out generators of the first homology group. Some regions, including nucleus accumbens, are connected to the entire brain by loops, whereas no region has non-zero connection strength to all brain regions. Thousands of loops go through the isocortex, the striatum and the thalamus. On the other hand, medulla is the only major brain compartment that contains more than 100 loops.


Introduction
The Allen Mouse Brain Connectivity Atlas [1] has filled a major gap in the knowledge of neuroanatomy by providing a brain-wide map of the connectome of the mouse brain. Adeno-associated viral vectors expressing enhanced green fluorescent protein were injected into the mouse brain, allowing to trace axonal projections. The scale of these experiments is mesoscopic [2], in the sense that injections target groups of neurons belonging to brain regions assumed to be homogeneous. The microscopic scale would correspond to mapping individual synapses where neurons make contact. The resulting three-dimensional fluorescent traces were registered to brain regions defined in the hierarchical Allen Reference Atlas [3].
These results allowed the construction of the first inter-region connectivity model of the mouse brain. It takes the form of a connectivity matrix, whose rows and columns correspond to a brain region, and whose entries model the connection strengths between pairs of regions defined by classical neuroanatomy. The derivation of the entries of of the connectivity matrix assumed homogeneity of brain regions and additivity of connection strengths. Projection densities correspond to axons highlighted by tracers, hence the projection densities from several different sources sum to produce projection density in a given region. This additivity assumption was used to address experimental data in which an injection of viral tracer infected several neighbouring brain regions.
The construction of the model from injection data (possibly infecting several regions for each injection) assumes that projections are homogeneous in each region, and that projections from different sources add to produce projection density to a given region. The entries of the connectivity matrix were observed to span a 10 5 -fold range, and are approximately fit by a log-normal distribution. The connectivity matrix is naturally mapped to a graph that models the wiring diagram of the mouse brain. Some features of the wiring diagram such as node degree and clustering coefficient have been observed to be reproduced by scale-free networks [4] and small-world networks [5], but none of these models was found to fit all observed features of the wiring diagram. This begs for further modelling taking into account the global properties of the connectivity map, together with the large range of intensities mapped by connection strengths.
On the other hand, the recent computational developments of algebraic topology [6][7][8][9] have given rise to spectacular applications to data analysis in biological sciences, including obstructions to phylogeny in viral evolution [10], and brain networks and neural correlations [11][12][13][14][15]. In this paper we will apply techniques from algebraic topology, a branch of mathematics characterizing properties of spaces that are invariant by continuous deformation, to work out global properties of the wiring diagram of the mouse brain.
We will first review the presentation of the Allen Mouse Brain Connectivity Atlas in the form of a matrix modelling connection strengths between brain regions. We will define a mapping from these data to a filtered simplicial complex and explain why the generators of the first homology group are in one-to-one correspondence with closed loops in the mesoscale connectome. The anatomy of these loops will then be analysed by grouping loops according to the major brain compartments they intersect. Moreover, we will compare the fraction of the brain reached from each region by loops, to the fraction reached by direct connections encoded in the connectivity matrix.

Connectivity strengths in the mouse brain, local and global properties
Each image series in the Allen Mouse Connectivity Atlas corresponds to an injection of tracer, followed by sectioning and and imaging of the brain 1 . In [1], registered data from 469 image series were combined in order to estimate an inter-region connectivity matrix, presented in matrix form 2 : C(r, r ) = connection strength from region labelled r to region labelled r , 1 ≤ r, r ≤ R. (1) The size of this connectivity matrix is R = 213, and each of the indices in {1, . . . , R} corresponds to a brain region, defined in the Allen Reference Atlas (ARA, [3,16]). The connectivity matrix captures the local structure of the connectome at mesocopic scale, as it estimates the strength of direct connections from a region to other regions. To capture global features of the connectome, we would like to identify closed circuits constructed from these connections. Algebraic topology formalises this notion: an irreducible closed loop in a topological space (a loop that cannot be shrunk to a point by continuous deformations) is a loop that is not the boundary of a disc in the topological space. It is therefore a one-dimensional cycle that is not the boundary of a two-dimensional object. The family of such objects in a topological space is invariant by continuous deformation, and has a group structure. It is called the first homology group of the topological space, and denoted by H 1 .
If we repeat this reasoning in dimension zero, we obtain the more familiar notion of connected component: elements of H 0 are zero-dimensional objects that cannot be joined by a path drawn on the topological space. They correspond to distinct connected components. More generally, the elements of the homology group H k are objects of dimension k that are not boundaries of (k + 1)-dimensional objects in a topological space, and therefore formalise the notion of hole. Moreover, the algebraic structure of homology groups allows us to count independent objects in each of them: the number of generators of the homology group H k is called the k-th Betti number and denoted by b k . The Betti numbers b 0 and b 1 are the number of connected components and the number of independent loops respectively.

Mapping the matrix of connection strengths to a filtered simplicial complex
The connectivity matrix C is not symmetric, because projections from a given brain region to other brain regions are oriented (just as axons are). Let us map the connectivity matrix to a weighted graph with R vertices corresponding to brain regions, and weighted edges corresponding to the entries of the connectivity matrix. Edges with two different orientations and different weights can exist between pairs of vertices. Let us apply a decreasing function to the entries of the connectivity matrix and define for instance: where is a positive regulator chosen to be smaller than the minimum connection strength, so that entries of d are bounded. The families of loops worked out in this paper do not depend on the choice of (in practice we took = 0.01min r,s C(r, s), resulting in a maximum entry of 42 for d). This operation results in an approximately normal distribution of entries in the matrix d. The quantity d(r, s) has been observed in [1] to be positively correlated to the spatial distance between region labelled r and region labelled s. However, the entries of d are not quite distances because d is not symmetric. We propose to map brain regions to a graph with 2R vertices as follows. Each brain region, labelled r, is mapped to a pair of vertices (s r , t r ), where s r represents the sources of action potentials (the axons of the cell bodies in the region, that conduct axon potentials), and t r the targets of action potentials (the dendrites in the region, that receive action potentials). An edge is drawn between vertices t r and s r for each r in {1, . . . , R}. With this doubling prescription we declare edges between targets and sources to be present in each region. We are going to look for closed paths in the wiring diagram of the brain, constructed from the non-zero entries C(s r , t r ) of the connection matrix. Direct approaches to persistent homology for asymmetric networks have been proposed, based on the Dowker complex, see [17], but the present doubling prescription takes advantage of the mesoscopic scale of the connectome data.
Given the family of 2R labelled vertices we have just described, we can construct a family of graphs and work out the families of independent loops for each of them by executing the following pseudo-code: 1. Consider a fixed f ∈]0, max 1≤r<s≤R d(r, s)] (referred to as a filtration value). 2. Draw an edge between any source labelled s a and any target labelled 3. Work out generators of the homology groups H 0 (f ) (connected components) and H 1 (f ) (independent loops) in the resulting graph.
The family of graphs (depending on the parameter f ) is called a simplicial complex. The third step of the procedure uses techniques of simplicial homology, implemented in JavaPlex [18,19]. Simplices in our case consist of vertices (zero-simplices, corresponding to brain regions), and edges (one-simplices, corresponding to axons connecting two brain regions). The result of the above procedure depends on the value of the filtration value f . At f = 0 we have as many connected components as points, and no loop. The number of connected components decreases when f increases. It reaches 1 for some value of the filtration value. When f is increased beyond this value, edges may still be drawn, possibly changing the number of loops. When f reaches the maximum entry of the matrix d, the simplicial complex cannot be changed anymore by increasing the filtration value. If we repeat the above procedure for all values of the entries of the matrix d, arranged in increasing order, we can work out for each feature F (which can be a connected component or a loop) an interval of filtration values [f min (F ), f max (F )] for which the feature F is present. These intervals can be drawn for all existing features F of a fixed dimension, yielding graphs of barcode form, one per dimension of feature (see Fig. 1B). Features persistent over a longer interval are thought less likely to be due to noise [20]. In particular, we will restrict ourselves to cycles that persist from their first appearance (at a given filtration value) to the maximum entry of the matrix d. The results of the above procedure are sketched of Fig. 1

Three-dimensional presentation of brain regions
To analyse the anatomy of persistent cycles, we used the voxelised version of the Allen Reference Atlas at a spatial resolution of 25 microns 3 . This three-dimensional grid contains numerical labels encoding neuroanatomy at the finest level compatible with the resolution (the voxelised atlas consists of V tot 3.1 × 10 7 cubic voxels, with 677 distinct regions). These numerical labels are associated to the names of brain regions accroding to the hierarchical annotation of the ARA 4 . For each of the R = 213 regions in the mesoscale model of the connectome (Eq. 1), we resolved the numerical label and the corresponding voxels. If the region has descendants in the hierarchical annotation, we lump together the voxels corresponding to these descendants, by annotating them with the numerical label of the region. After this step, the voxelised atlas contains 361 distinct regions. The regions included in the mesoscale model of the connectome therefore do not quite span the entire brain. Let us denote by V r the number of voxels in region labelled r: The brain regions corresponding to the rows of the connectivity matrix contain about 79 percent of the total volume V tot of the brain. In the rest of this paper we will disregard the brain regions that are not included in the matrix of connectivity strengths, and the volume V will be referred to as the volume of the brain. Grouping brain regions at a coarse hierarchical level (according to the ARA [3,16]) yields the following major brain compartments which will sometimes be designated by acronyms for presentation purposes: Isocortex, olfactory areas (OLF), cortical subplate (CTXsp), striatum (STR), pallidum (PAL), thalamus (TH), hypothalamus (HY), midbrain (MB), medulla (MY), cerebellar cortex (CBX). These major brain compartments will be referred to as big regions.

Fraction of the brain connected by loops to a given region
Given a brain region labelled r and a filtration value f , let us denote by C r (f ) the family of generators of the first homology group at filtration value f that go through the region r: and by C r (f ) the set of vertices through which the elements of C r (f ) go: In the toy model of Fig. 1 with three brain regions, the sets C 1 (2) and We can calculate the volume of the regions connected to the region r by loops, as a fraction of the total number of voxels in the brain: where V is the total number of voxels defined in Eq. 4, and V s is the number of voxels in region labelled s. As the volumes of brain regions are not all equal, we can define an alternative measure by the fraction of the total number of regions connected to region labelled r by loops: For a region labelled r that is connected by loops to all brain regions, for some value of the filtration f , we will have φ r (f ) = ν r (f ) = 1.
It is natural to compare these ratios to the analogous ratios obtained by taking into account only direct connections between pairs of brain regions, as encoded by the connectivity matrix C. Let us denote these quantities by φ where 1 denotes the indicator function. By construction the four fractions defined in Eqs 7,8,9,10 are growing functions of the filtration value f . There can be regions in the family C r (f ) that are not directly connected to region r based on the inter-region connectivity matrix. There can also be regions with projections to or from region r that are not involved in any closed cycle going through region r at the given filtration value f . There is therefore no a priori solidarity between the quantities φ and ν, based on loops, and their analogues φ (c) and ν (c) , based on direct connections.

6/19
2.4 Brain-wide maps of loops going through a given brain region To sum up the connections by loops between two given brain regions, we can calculate the sum of connection strengths at which cycles connecting these two regions appear, weighted by the numbers of such cycles. As the generators of the first homology groups are independent, this is consistent with the additivity assumption of connection strengths between brain regions. For two regions labelled r and s we introduce where the sum is over the distinct filtration values (the distinct entries of the matrix d defined in Eq. 2), and the exponential function comes from Eq. 2 relating connection strengths to the distance used to defined filtration values. Fixing the region label r and allowing the region label s to take all the possible values in {1, . . . , R}, we can define a brain-wide map of connection strengths to region labelled r, by defining for voxel labelled v: where s(v) is the index (in {1, . . . , R}) of the brain region to which voxel labelled v belongs. The quantity W r maps a voxel to a real number, and can be visualised as a heat map for instance.
To estimate how localised a brain-wide map of connection strengths is, we can compute the Kullback-Leibler divergence from the above-defined profile and the uniform brain-wide density. If we denote by the label of the region to which voxel labelled v belongs, this divergence is expressed as where V is the total number of voxels defined in Eq. 4, and we used the fact that W r /V is a normalised density function. This divergence induces a ranking of brain regions.

Persistent cycles
The bar code corresponding to the persistent generators of the first two homology groups is shown in Fig. 2. The maximum Betti numbers are too large for the individual bars to be visible, as they are on

Loops highlight the cortico-striato-thalamic network
To obtain a coarse picture of the anatomy of cycles, let us work out which family of big regions (as defined in Section 2.3.1) is intersected by each loop. We can rank the resulting families of big regions by decreasing value of prevalence. For an example of the brain regions in a loop appearing at filtration value f = 2, together with the names of the corresponding big regions, see Table 1. This loop appears at a low filtration value and we will see that the family of big regions it intersects (isocortex, striatum, thalamus and midbrain) is frequent among loops in the wiring diagram of the mouse brain. For definiteness we did this calculation at two special filtration values: -the filtration value f conn = 7 at which the Betti number b 0 falls to 1.
-the maximum filtration value (f max = 37), above which no extra loops appear (before the value − log = 42 introduced in Eq. 2 to provide a cut-off to the plots).
We found that 334 distinct families of big regions occur, but the most frequent 27 families account for 50% of the loops. The most prevalent families of big regions are presented in Table 2 (where they are ordered by decreasing prevalence at filtration value f = 7). We notice that midbrain is represented in most of the rows of this table 5 , and that the three most frequent combinations (accounting for more than 10% of all the loops) go through six or more big regions. The family of four big regions corresponding to the regions in the loop presented in Table 1 is the fourth most Six regions appear, belonging to four distinct big brain regions.
prevalent family of big regions in loops appearing at filtration values lower than f = 7, with 249 distinct loops (resp. fifth, f = 37 and 328 loops). If we focus on small families of big regions (with three regions or fewer), we observe that the most prevalent is the cortico-striato-thalamic family (7th row of the table, with 183 loops at f = 7). Highlighting occurrences of isocortex, striatum and thalamus in colour in Table 1 (in red, green and blue respectively), we notice that the first 17 rows present at least one of these colours (in 14 cases associated with midbrain, highlighted in brown). The next small family consists of isocortex and striatum (16th rank, with 106 loops), followed at rank 18 by medulla. With 104 loops, medulla contains more cycles than the cortico-thalamic family (ranked 25 with 80 loops). Moreover, medulla is the only single big region to support more than 10 cycles (see Table 3, which shows that only 147 cycles are confined to a single big region, out of which 134 appear before the filtration value 7). To assess whether the dominance of medullar loops among loops confined to a single big region could have been guessed based on the matrix of connectivity strengths, let us calculate how much of the connection strengths from its sub-regions project within the same region. For each big region labelled b, let us define: where the symbol 1(B(r) = b) equals 1 if the region labelled r is part of the big region labelled b (for instance we can read from the third row of Table 1 that 1(B(Caudoputamen) = Striatum) = 1). The ratio C intra /C tot is the conditional probability of a connection from within the big region b to project within region b. The sorted values of this ratio are plotted on Fig. 3. The average value is 31%, and the values range from 6% (thalamus) to 68% (cerebellar cortex, which is found to contain only one loop, see Table 3B). The fact that 80 percent of the cycles confined to a single big region are in medulla could not have been guessed based on the connectivity matrix only: 47% of connections from medullar regions project to medullar regions, which is above average, but the medulla is only ranked fourth among 12 big regions by our measure of conditional probability.

Loops can connect brain regions to the entire brain
On average, loops connect a given brain region to a larger domain in the brain than direct projections. This is true for both the volumetric and the counting measures defined in Eqs 9 and 10, and at all filtration values. This can be observed on Fig. 4, where the following averages across all Moreover, all four averages reach an asymptote, and the ratios of asymptotic values do not depend heavily on the choice of measure, as: However, the maximum values of the fractions φ r and ν r are highly heterogeneous among brain regions. The heat map of Fig. 5 shows the fraction of the brain φ r (f ), with the region labels r grouped by big regions (and ordered within each big region by decreasing value of the maximum reached at large filtration value). In particular, 18 regions reach the entire brain through connections by loops. Their names are shown on Table 4, together with the big regions to which they belong (midbrain is the most represented big region, with 4 regions). The ansiform lobule, which is part of the cerebellar cortex, is connected to the entire brain by loops, and is the brain region connected to the largest volume in the brain based on the connectivity matrix (it is ranked first by the measure φ (c) ). However, not all the regions appearing in Table 4 are that strongly connected based on the connectivity matrix. For instance the posterior amygdalar nucleus is ranked 173 out of 213 by the measure ψ (c) ).

Brain-wide density of connections by loops to a given region
The region for which W r has lowest Kullback-Leibler divergence from a uniform profile is the nucleus accumbens. The region with highest Kullback-Leibler divergence from a uniform profile is the flocculus, a region in the cerebellar cortex. Heat maps of the densities of loops for both regions are shown on Fig. 6. They are projected on sagittal, coronal and axial planes. For each of these regions the volume can be presented either as the sum of values of voxels (Figs 6A,C) or as maximal-intensity projections (Figs 6B,D). On maximal-intensity projections of the density W r , the voxels belonging to region labelled r have the largest value by construction (because all loops go through this region). Maximal-intensity projections can exhibit at most R different values (they are piecewise constant on each brain region). On the other hand, projections of sums of the density W r may have higher values in voxels that do not belong to region labelled r, depending on the direction of projection. Even though nucleus accumbens is connected to the entire brain by loops (meaning all the voxels in Figs. 6A,B correspond to non-zero values), the plots still appear rather shallow, and the strength of connection by loops is still far from uniform, even in the region that minimises the divergence from a uniform brain-wide profile.  Medulla, Midbrain, Pallidum The families of big regions are ordered by decreasing number of loops they contain (at filtration value f = 7).

15/19
Persistent homology reveals global properties of the mouse connectome at the mesoscopic scale. The high prevalence of cortico-striato-thalamic loops is a strong consistency check between the mesoscale experimental approach [1,2] and the present topological method, as the cortico-striato-thalamo-cortical circuit is known as a major circuit regulating complex behaviours (for a review of its involvement in the Tourette sydrome see [21]). The global nature of topological tools reveals connections between regions that are not linked by direct projections. Moreover, the lists of generators of the first homology groups allow to count the number of independent loops for a given threshold in connection strengths. The relative abundance of closed loops in the medulla is perhaps best interpreted in terms of the life-sustaining automatic functions regulated by the medulla. The corresponding circuits are topologically insulated from circuits involved in conscious behaviours or learning [22].
Axons, in addition to conducting action potentials, transport biological molecules [23,24]. Integration with other brain-wide data sets, in particular gene-expression data [25,26], has already shown that connected regions in the adult rodent brain have increased similarity in gene-expression profiles, and sets of genes most correlated to connectivity have been identified (see [27] for results based on connectivity data in the rat brain). It would be interesting to investigate to which extent these correlations persist across the loops that we identified in the mouse brain. Moreover, spatial gene expression patterns have been related to cortico-striatal functional networks [28]. Loops that are as persistent as the cortico-striatal ones yield natural candidates for places where to look for such genetic signatures of functional networks. Moreover, the brain-wide coverage of the gene-expression atlas [25,26] allows to estimate spatial densities of cell types known by their transcriptional activity [29,30,33], which yields neuroanatomical predictions relevant to brain disorders [31,32]. The present work could lead to insights into the correlation structure between the local density of loops and the spatial density of cell types. Another direction of research is the conservation of patterns across species [34][35][36], even though gene-expression and connectivity data do not have the same brain coverage as in the mouse atlas.
We have worked out the persistent loops of the wiring diagram of the brain based on ipsilateral projections only, leaving out the issue of contralateral proections. Contralateral projections have been mapped (see [1], right-hand side of Fig. 4A), but they do not allow to work out closed loops because the missing projections would involve injecting tracers into the left hemisphere, and mapping the contralateral projections again. However, the known contralateral connection strengths have been found to be significantly weaker than ipsilateral ones. One can therefore conjecture than loops going through brain regions in different hemispheres do not dramatically change the global features of the wiring diagram of the brain.
There are surprisingly few cycles confined to only one of of the major brain regions (or big regions), but most of them develop before the brain becomes fully connected. Moreover, medulla is by far the region with the most confined loops.

Acknowledgments
This work is supported by the Research Center for Precision Medicine, HT-URC, Xi'an Jiaotong-Liverpool University, Suzhou, China.