Skip to content
Publicly Available Published by De Gruyter March 9, 2021

Integrating additional knowledge into the estimation of graphical models

Yunqi Bu and Johannes Lederer

Abstract

Graphical models such as brain connectomes derived from functional magnetic resonance imaging (fMRI) data are considered a prime gateway to understanding network-type processes. We show, however, that standard methods for graphical modeling can fail to provide accurate graph recovery even with optimal tuning and large sample sizes. We attempt to solve this problem by leveraging information that is often readily available in practice but neglected, such as the spatial positions of the measurements. This information is incorporated into the tuning parameter of neighborhood selection, for example, in the form of pairwise distances. Our approach is computationally convenient and efficient, carries a clear Bayesian interpretation, and improves standard methods in terms of statistical stability. Applied to data about Alzheimer’s disease, our approach allows us to highlight the central role of lobes in the connectivity structure of the brain and to identify an increased connectivity within the cerebellum for Alzheimer’s patients compared to other subjects.

1 Introduction

A standard framework for dependence networks is Gaussian graphical models [1]. Gaussian graphical models have become particularly popular after the development of methods and algorithms that can handle large and high-dimensional data. Two widely-used approaches to Gaussian graphical models are neighborhood selection and graphical lasso. Neighborhood selection aims at graph reconstruction by aggregating local estimates [2], while graphical lasso is based on a global likelihood function [3], [4], [5]. Both approaches are now accompanied by a large number of research papers on theory and computation; we refer to [6], [7] and references therein. However, we show that even with optimal tuning and large sample sizes, standard methods for high-dimensional Gaussian graphical modeling can fail to provide accurate graph recovery.

Accordingly, the purpose of this paper is to estimate high-dimensional Gaussian graphical models more accurately. As a specific example, we are interested in estimating brain connectivity networks by analyzing resting-state fMRI data that describe the levels of co-activation between brain regions [8] as measured by changes in blood flow [9]. Brain regions have spatial coordinates, so in addition to the fMRIs, there is information in terms of pairwise distances between brain regions. Our key idea is to leverage this additional knowledge for more effective graph estimation. Our main strategy for this is to strengthen the role of tuning parameters. Commonly, tuning parameters are considered an inconvenience, because they need to be calibrated for each data set specifically. We instead think of this adaptability as an asset that can make tuning parameters a potent instrument for funneling external information into the estimation process. More specifically, for our goal of brain network estimation, we use tuning parameters to make neighborhood selection receptive to additional knowledge. We first show numerically that without such an integration, the sample size (or the number of fMRI scans per person, that is, the length of an individual’s scanning session) needed for accurate graph recovery of the brain network is exceedingly large. We then show that using our notion can lead to four general improvements:

Accuracy: we show by simulations that our approach can render graph estimation more accurate.

Stability: we use notions of stability and show the reliability of our approach with the fMRI data.

Accessibility: we point out that our approach is amenable to straightforward and efficient implementations based on existing software. In particular, our approach preserves the general forms of the standard penalties and does not introduce any additional penalty terms.

Interpretability: we demonstrate that our approach has a clear, biological Bayesian interpretation.

On brain fMRI data collected from patients with and without Alzheimer’s disease, the estimated brain connectivity networks are stable and in agreement with biological reasoning. Also compared to competing methods, our approach manifests the central role of the lobes in the connectivity structure more clearly. We find that connections are significantly more likely within the lobes than between the lobes, despite that we provide our method only with the pairwise distances and not with the lobe affiliations of the regions.

The remainder of the paper is structured as follows. In Section 2, we show how standard methods can fail to provide accurate graph recovery of brain connectivity networks and explain the biological motivations of incorporating additional spatial knowledge into the estimation process. Then in Section 3, we introduce our general concept for this incorporation. Next, in Section 4, we confirm that our approach can use additional knowledge to improve graph estimation for brain connectivity networks with simulations and a sensitivity analysis under three different scenarios of model misspecifications. And in Section 5, we study the brain networks estimated on real fMRI data and their stability for our method as well as for competing methods. In Section 6, we conclude with a discussion.

R code, proofs, and information about the preprocessing can be found in the Supplementary Material. Moreover, the data, further details on the preprocessing, and implementations of our methods can be found on github.com/LedererLab/fMRI.

1.1 A brief review of Gaussian graphical models

We give a brief review of Gaussian graphical models, our main tool for estimating brain connectivity graphs. Gaussian graphical models assume samples from a centered p-dimensional normal distribution N p ( 0 , Σ ) with a symmetric, positive definite covariance matrix Σ. The distribution is then complemented with an associated undirected graph G = ( V , E ) that has node set V = { 1 , , p } and edge set E = { ( i , j ) V × V | i j and ( Σ - 1 ) i j 0 } . The corresponding adjacency matrix is defined by A R p × p , A ij = 1 if ( i , j ) E and A ij = 0 otherwise.

The crux of Gaussian graphical models is that the conditional and marginal dependence structures of the samples are concisely captured by the edge set E . Indeed, the Hammersley–Clifford theorem [10], [11] states that the ith and jth coordinate of a sample from N p ( 0 , Σ ) are conditionally independent given all other coordinates if and only if ( Σ 1 ) i j = 0 , that is, ( i , j ) E . Moreover, the ith and jth coordinate are independent if and only if one cannot construct a chain of the form (i, k1), (k1, k2), …, (k l , j) by using only elements in E . Our goal is consequently to uncover the edge set E from data. For this aim, we develop estimators E ̂ E ̂ ( X ) of E based on independent identically distributed Gaussian samples X 1 , , X n R p summarized in the data matrix X = ( X 1 , , X n ) R n × p .

2 Motivation

fMRI is a promising gateway to understanding the human brain [8]. Our specific goal is to use brain activity records from resting-state fMRI to infer co-activation networks among brain regions. For this, we rely on data collected from outpatients at the Department of Neurology at Beijing Hospital. The imaging and preprocessing details can be found in the Supplementary Material.

The preprocessed data set comprises 37 subjects in total: 22 patients with Alzheimer’s disease (AD), 5 patients with mild cognitive impairment (MCI), and 10 patients with normal cognition (NC). Each subject’s data contains n = 210 consecutive scans. Each of the p = 116 variables is an average intensity over all voxels in an anatomical volume of interest defined by Automated Anatomical Labeling [12]. Autocorrelation was accounted for with an autoregressive integrated moving average model [13], [14].

The brain regions have spatial coordinates, so aside from the intensity fMRI data, we have additional knowledge in terms of pairwise distances between the 116 regions. How and why should we incorporate this additional knowledge into estimating the brain connectivity network? Our biological motivation is that direct connections are more likely between close brain regions than between distant brain regions. Distant brain regions are more likely to be connected indirectly, that is, through other regions. Our concept in Section 3 based on graphical models is ideal for incorporating this understanding.

2.1 The need for additional knowledge

Here, we show that even with optimal tuning and large sample sizes, standard methods for Gaussian graphical modeling can fail to provide accurate graph recovery. For this, we conduct a simulation study comparing four of the most popular methods: thresholding the partial correlation matrix (THR); neighborhood selection with the “or-rule” (MB(or)); neighborhood selection with the “and-rule” (MB(and)); and graphical lasso (GLASSO). THR calculates each partial correlation individually and then applies an overall threshold. MB cycles through all nodes, estimating the connections of that node, and then combines the estimates. GLASSO estimates all connections simultaneously through maximum regularized likelihood.

We simulate data with the aim of mimicking the brain fMRI application. In this application, p = 116 is the number of brain regions and n = 210 is the number of scans for a single patient. A standard preferential attachment algorithm [15] is used to construct 115 edges between these nodes. The resulting edge set E determines which off-diagonal entries of the inverse covariance matrix Σ−1 are non-zero. The values of these entries are then independently sampled uniformly at random from [−1, −0.2] ∪ [0.2, 1]. The diagonal entries of Σ−1 are set to a common value such that the condition number equals 100. With Σ−1 constructed this way, vectors are independently sampled from the Gaussian distribution N p ( 0 , Σ ) and summarized in data sets with sample sizes n ∈ {50, 100, 200, 400, 600, 800, 1000}, which, for illustration, deviates from the application’s n = 210.

MB(or) and MB(and) are described in [2]; the node-wise regressions are performed by glmnet [16] with standard settings for the lasso estimator. GLASSO is described in [3], [4], [5]; our implementation is based on the glasso package [17] with standard settings.

The accuracy of graph recovery is assessed via Hamming distance. Hamming distance between the estimated edge set E ̂ and the true edge set E is defined by d H ( E ̂ , E ) { ( i , j ) ( i , j ) E ̂ , ( i , j ) E } { ( i , j ) ( i , j ) E ̂ , ( i , j ) E } . In terms of the corresponding adjacency matrices A ̂ and A, this equals d H ( E ̂ , E ) = | | | A ̂ A | | | 1 , where |||⋅|||1 is the entrywise 1-norm; larger Hamming distances indicate less accurate estimation. The tuning parameters of all methods are calibrated to minimize Hamming distance, noting that the true graphs are known in simulations. This “oracle” tuning allows us to study the maximal potential of the standard methods’ accuracies.

Table 1 contains the Hamming distances averaged over 50 repetitions. For n = 1000, MB(and) is the most accurate approach among all four methods, having an average Hamming distance of 27. Observe, however, that 27 wrongly assigned edges still mark a poor performance given that there are only 115 true edges in total. For smaller sample sizes, the accuracies of the methods decline even further. In view of real data being commonly high-dimensional, where p is of the same order as n – or even larger, these observations thus highlight that high-dimensional graphical modeling is challenging and that standard approaches are not necessarily providing meaningful results. In particular, the results motivate us to think about what additional knowledge is available in a given application and how to use it in the estimation process. This is the objective of the remainder of this paper.

Table 1:

Graph estimation with optimally-calibrated standard methods becomes more accurate as the sample size n increases. However, even with n = 1000, which is much larger than the number of nodes p = 116, the minimum Hamming distance is still 27. Given that there are only 115 true edges in total, 27 wrongly assigned edges mark a poor performance. Thus, graph estimation with standard methods can still be highly inaccurate even when the sample sizes are very large.

Hamming Distance
Method Sample size n
50 100 200 400 600 800 1000
THR 112 64 46 36 29
MB(or) 108 107 99 98 96 94 90
MB(and) 104 81 76 48 40 31 27
GLASSO 103 84 77 58 44 34 32

3 Method

Now, we are ready to introduce and discuss how to amend the estimation process with additional knowledge. We first explain our statistical scheme to incorporate additional knowledge into graph recovery. The resulting method can be computed efficiently with standard software packages. Then, we establish a theory for exact graph recovery for our method. Last, we provide a direct Bayesian interpretation of the method to bridge our biological motivation with the additional spatial knowledge of the brain.

3.1 Neighborhood selection with additional knowledge

A starting point particularly suited for our approach is neighborhood selection. The main idea of neighborhood selection is that graph recovery can be established through a sequence of regressions. The corresponding regression parameter β j R p 1 for a given node j determines the edges between node j and the other nodes: ( β + j ) i 0 ( i , j ) E , where β + j ( β 1 j , , β j 1 j , 0 , β j j , , β p 1 j ) R p is β j with a zero added as the jth entry. With this in mind, the standard regression estimators are of the form

(1) β ̂ j , r j arg min β R p 1 1 2 | | X j X j β | | 2 2 + r j | | β | | , ( j { 1 , , p } ) .

Here, the vector X j R n denotes the jth column of the data matrix X, the matrix X j R n × ( p 1 ) denotes X without the jth column, the positive numbers r1, …, r p ∈ (0, ∞) are tuning parameters, and ||⋅|| is a norm (or more generally, a convex, positive function, such as the Ridge penalty). A standard example is neighborhood selection with the lasso, where ||⋅|| is set to the 1-norm. Adopting the “and-rule”, the edge set E is finally estimated by E ̂ = { ( i , j ) i j , ( β ̂ + i , r i ) j 0 and ( β ̂ + j , r j ) i 0 } .

The choice of MB(and) as the starting point for our approach is motivated by the simulation results in Section 2.1 and the fact that there is a myriad of existing software packages for solving problems of the form (1). In principle, however, one could apply the following recipe also to MB(or) and GLASSO.

Now, our proposal is to incorporate additional knowledge by “upgrading” the univariate tuning parameters r1, …, r p to vectors. For this, we assume additional knowledge in the form of a matrix D R p × p . The general idea is that the element D ij contains information about the connection between nodes i and j. Typical sources for such information are (1). biological insights (like the geometry of the brain), (2). results of an initial analysis on the given data (using a standard method such as the graphical lasso, for example), or (3). results of an analysis of different but related data (such as the outcomes of a previous study). In this paper, we focus sharply on biological insights in the context of brain imaging: D ij is the Euclidean distance between nodes i and j, and the rationale is that nodes that are close are more likely to be directly connected than nodes that a far apart from each other.

Despite our focus on neuroscience, we keep much of our framework general to prepare for other fields of application. In genomics, to give just one example, D could incorporate the geometric distance between genes – or a proxy of it, such as the number of basepairs in between genes. Or instead, D could describe the genes’ co-expression: D i j = p ̂ i j if gene i and j are known to co-expressed with probability p ̂ i j and D ij = 0 otherwise. The potential of our approach in such applications, however, needs to be evaluated in future research.

To incorporate D into neighborhood selection, we transform the one-dimensional tuning parameters r j ∈ (0, ∞) into multi-dimensional tuning parameters r j ∈ (0,∞) p . We then enrich these tuning parameters with additional knowledge by setting ( r j ) i = r ̄ j f ( D i j ) , where f : R ( 0 , ) is a positive link function. In our brain connectivity application, the link function is defined as f(x) = x3 to capture the three-dimensionality of the coordinates. This yields tuning parameters of the form ( r j ) i = r ̄ j D i j 3 . Here r ̄ j is an overall tuning parameter for the jth node’s regression. As customary in high-dimensional statistics, the free parameter r ̄ j balances the weights of the data (Xj in our case) and the structural assumptions (captured by ||⋅|| and D in our case). In practice, r ̄ j can be calibrated via 10-fold cross-validation. The above node-wise regression (1) is then generalized to

(2) β ̂ j , r j arg min β R p 1 1 2 | | X j X j β | | 2 2 + | | r j j β | | , ( j { 1 , , p } ) ,

where the circle ⚬ indicates element-wise multiplication, r j j R p 1 is r j R p omitting the jth element, and the estimator E ̂ is generalized accordingly.

Let us discuss three practical and methodological aspects of (2). The first important observation is that while there are now p “tuning” parameters per regression, there is still only one free parameter (namely r ̄ j ) that requires calibration. This is highly desirable: for one parameter, efficient calibration schemes are known [18], [19], [20]; in contrast, the calibration of multiple free parameters, which would appear when adding additional penalty terms instead of adopting our approach, remains a major challenge in both theory and computations. Second, for the link function f, there are often natural choices, such as our choice to reflect the three-dimensionality of distances. Third, note that our approach retains the “flavor” of the original penalty. This means that our concept maintains the general properties of the penalty, such as the sparsity generating effect of 1-penalties [21], and it means that the practical implementation of (2) can be based on standard software packages, such as glmnet [16] in the case of 1-penalization.

In the following, we set the penalty in (2) to the 1-norm and refer to this specification of our method as SI (structural information). An R implementation of SI is provided in the Supplementary Material.

3.2 Theory

Here, we show our approach is amenable to the standard proof techniques in high-dimensional statistics. We apply the primal-dual witness technique to SI to show that we can exactly recover the true graph under classical assumptions. Note first that a sufficient condition for exact graph recovery is that the estimators (2) provide exact variable selection at each node j, cf. [22, Section 7.3]. We denote the corresponding target regression vector by β * j and the noise by ɛ j . We can then formulate versions of the standard irrepresentability and beta-min conditions and bounds for the tuning parameters [6], [7], [23], [24]. To this end, denote by X * j the matrix X restricted to the columns that have index in the set S * j k { 1 , , p } k j and ( k , j ) E and by X * j the matrix X restricted to the columns that have index in k { 1 , , p } k j and ( k , j ) E . This means that X * j corresponds to Xj on the support of β * j and X * j to the remaining parts of Xj. In the same spirit, we allow ourselves to use the analogous notations r * j , r * j , and ( β * j ) * . We then state the following.

Assumption 3.1

(Irrepresentability condition). For each j ∈ {1, …, p}, we assume that ( X * j ) X * j is invertible and that there is a constant b j > 0 such that

| | r * j ( X * j X * j ) 1 X * j X i | | 1 / ( r j ) i 1 b j

for all i { k { 1 , , p } k j and ( k , j ) E } .

One can check that the more informative D is, the weaker is the condition on the design. We then state the next assumption.

Assumption 3.2

(Beta-min condition). For each j ∈ {1, …, p} and corresponding tuning parameter r j , we assume that the regression vectors satisfy

min i { 1 , , | S * j | } 1 ( r * j ) i ( ( β * j ) * ) i > 1 n 1 + max ν { ± 1 } | S * j | k { 1 , , | S * j | } ( X * j X * j / n ) 1 ν k 2 ( r * j ) k .

These irrepresentability and beta-min conditions are precisely the standard ones for lasso regression except for the generalization to vector-valued tuning parameters r j .

Finally, we assume that the tuning parameters are suitably chosen.

Assumption 3.3

(Tuning-parameter choice). We assume that the tuning parameters satisfy

1 > max i { 1 , , p 1 | S * j | } 2 X * j I n X * j ( X * j X * j ) 1 X * j ε j i ( r * j ) i b j

and

1 > max i { 1 , , | S * j | } n ( X * j X * j ) 1 X * j ε j i ( r * j ) i .

This again is an extension of standard bounds for tuning parameters to our setting.

We can now prove the following result.

Theorem 3.1

(Exact graph recovery with SI). Under Assumptions 3.1–3.3, SI provides exact graph recovery, that is,

E ̂ = E .

This result implies that SI can provide exact graph recovery if conditions that are analogous to the ones formulated in the literature are met. The proof follows the primal-dual witness rationale [25], see also [7, Chapter 11.4] and [26], and is deferred to the supplementary material.

Assumptions 3.1–3.3 follow well-established standards in high-dimensional statistics. While this fact shows that our theory is well-grounded, it also shows that it has the same limitation as almost all theories in high-dimensional statistics: the assumptions can usually not be verified in practice. Corresponding ideas in the context of Assumption 3.1 can be found in [27], [28]; ideas in the context of Assumption 3.2 and 3.3 can be found in [19], [29], [30]. At this point, however, the stated assumptions remain unverifiable in practice.

3.3 Bayesian interpretation

Finally, our approach has a direct Bayesian interpretation. For a given index j, we consider the hierarchical Bayesian model

(3) X j β , σ N ( X j β , σ 2 I n × n ) ,
(4) P β i σ , ( r j j ) i = ( r j j ) i 2 σ 2 e ( r j j ) i σ 2 | β i | , ( i { 1 , , p 1 } ) .

This model is a generalization of the model considered in the seminal paper that introduces the Bayesian lasso [31]. The parameters ( r j ) i in our general model are hyperparameters that specify the shape of the prior distribution of each of the regression coefficients ( β j ) i . The negative log-posterior distribution of β is now given by

log P ( β X , σ , r j j ) = 1 σ 2 1 2 | | X j X j β | | 2 2 + i = 1 p ( r j j ) i | β i | + c ,

where c is a term independent of β. The mode of this distribution is

β ̂ j , r j arg min β R p 1 1 2 | | X j X j β | | 2 2 + i = 1 p ( r j j ) i | β i | , ( j { 1 , , p } ) ,

which equals the estimator yielded by our approach SI, that is, (2) with the penalty norm set to the 1-norm. Similarly, replacing the double-exponential distribution in (4) with a Gaussian distribution, the posterior mode equals the estimators yielded by our approach when the penalty in (2) is set to the 2-norm.

This analysis shows that the tuning parameters relate our frequentist and Bayesian notions about the additional knowledge. In our frequentist estimator, the larger ( r j ) i , the more likely the edge (i, j) is excluded from the estimate. In our Bayesian view, the larger ( r j j ) i , the more the assumed distribution of ( β j ) i is concentrated around zero. Thus, funneling the additional knowledge into the tuning parameters of the frequentist estimator can be viewed as transforming the original priors, which are the same all across the coefficient vector, into informed priors tailored to each coefficient based on our biological rationale. In general, the Bayesian view provides further support for our concept – without inflicting the computational challenges that typically come with Bayesian methods.

4 Simulations

We now confirm that our method SI can improve graph estimation by conducting a simulation study and a sensitivity analysis.

First, we demonstrate in a simulation study that our approach can use additional knowledge to improve graph estimation. To generate data, we imitate the brain connectivity application in Section 5: the number of nodes p = 116 corresponds to the number of brain regions; the number of samples n = 210 corresponds to the number of fMRI scans per subject; the additional knowledge D ij is the Euclidian distance between brain regions i and j. The edge set is constructed based on independent Bernoulli(p ij ) distributions, p ij = inv.logit(10 − D ij /3), such that an edge (i, j) is included if the corresponding Bernoulli outcome is one. The form of the distributions captures our rationale that direct connections are predominately between close regions. An anatomical map of a graph generated by the preceding scheme is displayed in the left panel of Figure 1. Next, the off-diagonal non-zero entries in the inverse covariance matrix Σ−1 as specified by the edge set are set to 0.3. The diagonal entries are set to 0.2 + 0.3 ⋅ σmin, where σmin is the minimal singular value of the adjacency matrix. This construction ensures that Σ−1 has full rank. Finally, n i.i.d. samples are generated from N p ( 0 , Σ ) .

Figure 1: 
Left panel: one example of the 40 simulated graphs. Right panel: ROC curves demonstrating that SI can outperform standard methods when additional knowledge is available.

Figure 1:

Left panel: one example of the 40 simulated graphs. Right panel: ROC curves demonstrating that SI can outperform standard methods when additional knowledge is available.

Figure 2: 
ROC curves under three data generation schemes demonstrating that: (1) SI can largely outperform standard methods when the additional knowledge is correct; (2) SI can outperform standard methods when the additional knowledge is partly correct; (3) SI is underperforming standard methods only slightly when the additional knowledge is completely wrong.

Figure 2:

ROC curves under three data generation schemes demonstrating that: (1) SI can largely outperform standard methods when the additional knowledge is correct; (2) SI can outperform standard methods when the additional knowledge is partly correct; (3) SI is underperforming standard methods only slightly when the additional knowledge is completely wrong.

We run our approach SI against standard methods for graph estimation. The standard methods competing are THR, MB, and GLASSO. Motivated by the simulation results in Section 2.1, we adopt the “and-rule” for SI and MB. A total of 40 data sets are generated, on which each method is run as a function of its free tuning parameter. The average ROC curves for graph recovery are plotted in the right panel of Figure 1. We find that the ROC curve of SI dominates the other curves, demonstrating that SI can improve graph recovery when additional knowledge is available.

Next, we demonstrate the performance of our approach with a sensitivity analysis under three different data generation scenarios. Again we let n × p = 210 × 116. To generate data, we start from the hierarchical Bayesian model, sample β j from the double-exponential distribution (4). For the three data generation schemes, we generate data with (1) the true model ( r j j ) i σ 2 D i j 3 , (2) model misspecification ( r j j ) i σ 2 D i j , (3) no additional knowledge used ( r j j ) i σ 2 = c . We set the off-diagonal entries to ( Σ - 1 ) i j = 1 2 ( β j i ( Σ - 1 ) i i + β i j ( Σ - 1 ) j j ) . The diagonal entries of Σ−1 are then set to a common value such that the condition number equals 100. Last, i.i.d. samples are generated from the Gaussian distribution N p ( 0 , Σ ) .

The average ROC curves for graph recovery over 40 repetitions are plotted in Figure 2. We find that the ROC curve of SI dominates the other curves when additional knowledge is used correctly. Our advantage decreases with the degree of model misspecifications, but our estimator’s performance remains close to state-of-the-art methods even when the “knowledge” is completely wrong.

While the simulations provide us with some evidence, they cannot cover all cases that one could potentially encounter in practice. We thus complement the above analysis with a stability study on real fMRI data in the following section. Together, the results provide us with sufficient confidence to argue in favor of SI.

5 Real data analysis

To reiterate, our biological rationale for SI with the additional pairwise distances between brain regions is that direct connections are more likely between close regions than between distant regions. The SI incorporates this notion: first, graphical models, in general, distinguish between direct connections (conditional dependence) and indirect connections (marginal dependence). Second, the mentioned rationale that the distance between two nodes influences the likelihood of them being directly connected can be reflected naturally by how the link function f incorporates the additional distance information D. Importantly, we do not generally exclude direct or indirect long-range connections. This means that two distant nodes can very well be connected, but more likely, this connection will be indirect.

We now analyze the fMRI data on Alzheimer’s disease with our method and compare the results to those of competing methods. We find that our pipeline can increase the stability of graph estimation on fMRI data, and we identify structural differences between the connectomes of Alzheimer’s patients and healthy subjects.

Figure 3: 
Anatomical maps of the estimated brain connectivity networks show that in contrast to standard methods, SI entails direct connections mostly between spatially close regions (orange lines).

Figure 3:

Anatomical maps of the estimated brain connectivity networks show that in contrast to standard methods, SI entails direct connections mostly between spatially close regions (orange lines).

Figure 4: 
Contrast of average brain graphs between the AD and NC groups. Yellow indicates that an edge occurs more frequently in the AD group; blue indicates that an edge occurs more frequently in the NC group. The diagonal is colored in red. The graph indicates that there are considerable differences between the groups in some regions.

Figure 4:

Contrast of average brain graphs between the AD and NC groups. Yellow indicates that an edge occurs more frequently in the AD group; blue indicates that an edge occurs more frequently in the NC group. The diagonal is colored in red. The graph indicates that there are considerable differences between the groups in some regions.

5.1 Specification of the analysis

We compare our approach SI to THR, MB, and GLASSO. Again, we adopt the “and-rule” for SI and MB. The tuning parameters in MB and SI are selected via 10-fold cross-validation. Since the tuning parameter in GLASSO is not amenable to the same cross-validation scheme, GLASSO is calibrated via BIC. In absence of established selection criteria for thresholding, THR is calibrated such that the number of connections equals the number of connections for SI.

Note that the coordinates of the observations in the present data set correspond to volume regions in the brain. The observations in another type of fMRI-induced data that received considerable attention recently correspond to the cerebral cortex [32]. We would then recommend our approach with geodesic distances among the regions as additional knowledge and a link function of form f(x) = x2 to capture the two-dimensionality of the spherical coordinates.

Figure 5: 
Average brain graphs within the 42 ROI for the AD and NC group. The graphs indicate that connections are more likely within-lobes than between-lobes.

Figure 5:

Average brain graphs within the 42 ROI for the AD and NC group. The graphs indicate that connections are more likely within-lobes than between-lobes.

5.2 Stability study

A reliable statistical method should provide similar outcomes across similar data sets. We thus study stability as a notion of similarity among outcomes of a method. In particular, we consider stability as an aspect of reproducibility.

We use two measures of stability. Both measures rely on data splitting with the main idea that agreement of estimates based on two parts of a data set indicates that the method is reliable, while largely disagreeing estimates raise a red flag. For the first comparison of different methods on the fMRI data, we proceed as follows. Given a patient and a method, we split the data containing the 210 scans randomly into two sets of 105 scans; then, we compute stability in terms of what we would call graph agreement

GA 1 d H ( E ̂ , E ̃ ) | E ̂ | + | E ̃ | = 1 | | | A ̂ A ̃ | | | 1 | | | A ̂ | | | 1 + | | | A ̃ | | | 1

for the graphs E ̂ and E ̃ (with corresponding adjacency matrices A ̂ and A ̃ ) that are estimated on the two halves of the data. We finally take averages over 20 random splits each for all patients.

For the second comparison, we adopt the estimation stability concept of [33]. To this end, we split the data into v equal subsets of scans and compute the corresponding adjacency matrices A ̂ 1 , , A ̂ v . We then compute

ES v 1 1 v k = 1 v | | | A ̂ k 1 v l = 1 v A ̂ l | | | F 2 | | | 1 v m = 1 v A ̂ m | | | F 2 ,

where |||⋅|||F is the Frobenius norm. This definition of ES v corresponds to [33, Page 1491], other than our definition is one minus the original one and [33]’s regression accuracy is replaced by a graph recovery accuracy here. Compare also to [34, Page 473]. We finally average ES v over all patients.

The values of GA, ES2, ES5, and ES10 are summarized in Table 2. (The sample sizes in the split data are not sufficiently large for the comparison of THR.) Note that for both GA and ES, large values mean high stability. Thus, SI is consistently the most stable method, suggesting that it can integrate additional knowledge effectively and make graph estimation more reproducible.

Table 2:

SI has higher stability on the fMRI data set than standard methods.

Method GA ES2 ES5 ES10
SI 0.62 0.75 0.39 0.02
MB 0.42 0.53 −0.27 −1.43
GLASSO 0.55 0.69 0.13 −0.59

We have shown that SI can outperform standard methods when additional knowledge is available, can improve stability of graph estimation, is supported by theory, and has a Bayesian interpretation that matches our biological understanding while still being computationally convenient.

5.3 Results for brain connectivity

Having confirmed our approach’s potential, we can now analyze the fMRI data on Alzheimer’s disease. We compare our method with existing techniques and then work out the differences between brain connectivity networks of AD patients and normal aging subjects. For the latter task, we particularly focus on 42 regions of interest (ROI) for AD that have been identified in the literature [35], [36]. These regions are located in the frontal lobe, parietal lobe, occipital lobe, and temporal lobe. We provide evidence for three main claims: (1) connections are significantly more likely within the lobes than between the lobes; (2) AD and NC differ mainly in their intra-lobe connections; (3) the connectivities of AD and NC differ from each other significantly within each of the four lobes. These insights could not be derived from competing methods.

Figure 6: 
Contrast of average brain graphs between the AD and NC groups within the 42 ROI. The graph indicates that the contrasts between the groups are located mainly within the lobes and that the graphs of AD and NC indeed differ considerably from each other within the lobes.

Figure 6:

Contrast of average brain graphs between the AD and NC groups within the 42 ROI. The graph indicates that the contrasts between the groups are located mainly within the lobes and that the graphs of AD and NC indeed differ considerably from each other within the lobes.

Figure 7: 
Top panel: average brain graphs from GLASSO within the 42 ROI for the AD and NC group. Bottom panel: contrast of average brain graphs from GLASSO between the AD and NC groups within the 42 ROI. The plots do not exhibit a strong connection with the lobe structure.

Figure 7:

Top panel: average brain graphs from GLASSO within the 42 ROI for the AD and NC group. Bottom panel: contrast of average brain graphs from GLASSO between the AD and NC groups within the 42 ROI. The plots do not exhibit a strong connection with the lobe structure.

To compare methods, graphs for one subject from each of the disease groups are displayed in Figure 3. The graphs are estimated based on all the data available for one subject. Each column in the figure corresponds to one (fixed) subject of the three groups AD, MCI, and NC; each row corresponds to one of the four methods SI, THR, MB, and GLASSO. The plots show anatomical maps, that is, the nodes are positioned according to their 3D coordinates. Edges between close regions (distances in the lower quartile) are colored orange; edges between more remote regions are colored blue. The total number of edges is stated below each map. While showing only one set of graphs per group for illustration, we find two patterns across all subjects: (i) with cross-validation as the most common calibration scheme applied, SI yields the most sparse networks. This finding shows that SI can lead to more manageable models. (ii) SI yields some edges between distant regions, but most edges are between spatially close regions (orange). In strong contrast, the other methods do not exhibit such a preference. This finding confirms our expectations about the four methods, and it shows that the estimates provided by SI are in agreement with the biological rationale.

To highlight contrasts among the study groups, the within-group averages of the estimated adjacency matrices are compared:

A ̄ group 1 N group k group A ̂ k ,

where group ∈ {AD, MCI, NC} is the study group, A ̂ k is the SI estimate of the adjacency matrix for subject k, and Ngroup is the total number of subjects in the group. We first consider the entire brain to obtain an overview of the results. The contrast between the average graphs of AD and NC, that is, A ̄ AD A ̄ NC , is displayed in Figure 4. The entries in the heat-map thus denote the frequencies of the edges in the AD group minus the corresponding frequencies in the NC group. The columns and rows are arranged such that spatially close regions tend to be close in the figure. Our main observation is that the overall graphs for the two groups seem to be similar, but there is high variability between AD and NC concentrated in some regions (see blue and yellow entries close to the diagonal). In particular, we find an increase in connectivity within the cerebellum for AD patients compared to NC patients, in concordance with the findings in [37].

For further analysis, we now focus on the mentioned 42 ROI. The corresponding subgraphs for the AD and NC groups are provided in Figure 5. The left panel displays A ̄ AD , the average graph estimates for the AD group, and the right panel A ̄ NC , the average graph estimates for the NC group, both restricted to the 42 ROI. The red squares highlight the lobes. The two plots indicate that connections are more likely within the lobes than between the lobes (the majority of gray and black cells are inside the red squares). To bring this visual observation on statistical grounds, we give the results for testing intra versus inter connectivities in Table 3. Precisely, we consider for each group ∈ {AD, MCI, NC} and lobe ∈ {frontal, parietal, occipital, temporal}, the quantity

Δ intra - inter 1 p lobe i ROI lobe 1 p lobe j ROI lobe A ̄ group i j 1 42 p lobe j ROI \ lobe A ̄ group i j ,

where plobe is the number of nodes in the lobe in the 42 ROI. This means that Δintra-inter is the difference of the intra (within-lobe) and inter (between-lobes) connectivities averaged over each region in the lobe. The (individual) p-values correspond to the t-tests of the null-hypothesis H0: Δintra-inter = 0. These findings provide evidence for our first claim that connections are significantly more likely within the lobes than between the lobes.

Table 3:

Connections are significantly more likely within-lobes than between-lobes.

Group Lobe Δintra–inter p
AD Frontal lobe 0.265 <0.0001
Parietal lobe 0.330 0.0001
Occipital lobe 0.280 0.0001
Temporal lobe 0.168 <0.0001
MCI Frontal lobe 0.191 <0.0001
Parietal lobe 0.261 0.0003
Occipital lobe 0.351 <0.0001
Temporal lobe 0.199 <0.0001
NC Frontal lobe 0.259 <0.0001
Parietal lobe 0.346 0.0001
Occipital lobe 0.301 0.0002
Temporal lobe 0.173 <0.0001

Competing methods such as GLASSO do not relate to a lobe structure, see Figure 7. In clear contrast, looking again at Figures 5 and 6, we see much more within-lobe connections in our plots (black cells within the red squares). Importantly, other short connections that are between-lobe connections are not picked up by our method even though the penalty for all the short connections are the same. This means that SI accentuates the biological lobe structure of the brain rather than just selecting arbitrary short distance connections, which provides us with further confidence in our method’s compatibility with the data (Figure 7).

The 42 region subgraph of Figure 4 is provided in Figure 6. The heat map shows A ̄ AD A ̄ NC , the contrast between the average graphs of AD and NC, restricted to the 42 ROI. The figure indicates that AD and NC differ mainly in their intra-lobe connections (the majority of yellow and blue cells are inside the red squares). The results of a statistical analysis of this observation are stated in Table 4. Precisely, we consider the quantity Δintra-inter with A ̄ group = | A ̄ AD A ̄ NC | . This means that Δintra-inter is the difference of the intra and inter absolute connectivity contrast between AD and NC averaged over the regions in the lobe. The p-values correspond to the t-tests of the null-hypothesis H0: Δintra-inter = 0. These findings provide evidence for our second claim that AD and NC differ mainly in their intra-lobe connections.

Table 4:

AD and NC differ significantly more within-lobe than between-lobes.

Group Lobe Δintra–inter p
|AD–NC| Frontal lobe 0.079 0.0001
Parietal lobe 0.064 0.0001
Occipital lobe 0.049 0.0014
Temporal lobe 0.040 <0.0001

Figure 6 also indicates that the connectivities of AD and NC indeed differ considerably within each lobe. Similar claims have been made in multiple papers [36], [38], [39], [40], [41]. We state the results of a corresponding statistical analysis in Table 5. Precisely, we consider for lobe ∈ {frontal, parietal, occipital, temporal}, the quantity

Γ | AD-NC | 1 p lobe ( p lobe 1 ) i , j ROI lobe A ̄ AD i j A ̄ NC i j .

This means that Γ|AD–NC| is the absolute contrast between the AD and NC graphs averaged over all (potential) connections in the lobe. The p-values correspond to the t-tests of the null-hypothesis H0 : Γ|AD–NC| = 0. These findings provide evidence for our third claim that the connectivities of AD and NC differ from each other significantly within each of the four lobes.

Table 5:

The connectivities of AD and NC differ from each other significantly within each of the four lobes.

Lobe Γ|AD–NC| p
Frontal lobe 0.083 <0.0001
Parietal lobe 0.072 <0.0001
Occipital lobe 0.060 0.0035
Temporal lobe 0.045 <0.0001

6 Discussion and further research

We have shown that strengthening the role of tuning parameters is an effective approach to incorporate additional spatial knowledge in estimating graphical models more accurately. In particular, we have shown that this approach can improve the stability of graph estimation, has a clear Bayesian interpretation, and is computationally convenient.

We have exemplified the proposed method SI for graph estimation of brain connectivity based on brain fMRI data and distance information. From the Alzheimer’s disease data considered in this paper, we find an increase in connectivity within the cerebellum for AD patients compared to NC subjects. More importantly, unlike any competing method, SI generates brain connectivity networks that accentuate the four biological lobes of the brain: connections are significantly more likely within the lobes than between the lobes, and AD patients and NC subjects differ mainly in their intra-lobe connections, with significant differences in all four lobes. This emphasizes the lobe structure’s important role in brain connectivity.

More generally, our scheme allows for the inclusion of application-specific information matrices D and link functions f. The information matrix D is typically predetermined in a given application. Similarly, the form of f often follows clear scientific rationales as can be seen in our brain connectivity case. One could also consider data-driven selections of f from a set of candidate functions by using score testing or related methods, cf. [42]. The sample sizes of typical data might be too small for a thorough non-parametric selection of the link function, but one could envision cross-validation on functions of the form D i j k , k ∈ {1, 2, …}, for example. Importantly, via the link function f and the information matrix D, our approach can be tailored to a wide range of applications in bioinformatics, speech recognition, computer vision, and digital communications. For example, we envision implementations in genomics, where the goal is to estimate gene regulatory networks based on gene expression levels [43]; the additional knowledge that researchers commonly want to invoke for this are previously established sub-networks or networks estimated in other studies.

The specific data considered in this paper have about as many samples as nodes: np ≈ 100. This is clearly a high-dimensional setting, because the number of parameters, that is, the number of possible connections, is p(p − 1)/2 ≫ n, which rules out classical approaches such as unregularized maximum likelhood. We also expect our pipeline to be of interest whenever p ≫ 1 irrespective of n, because lasso-type methods can increase interpretability through sparsity. But the larger p in relation to n, the more fundamental is the use of additional knowledge; therefore, we see a particularly large potential in high-resolution networks, such as voxel-based brain networks with thousands or millions of nodes.

We finally believe that our approach is also amenable to non-Gaussian data and further inferential techniques: heavy-tailed or otherwise non-Gaussian data could be approached based on [44]; additional inferential tools could be established by combining our ideas with [45].


Corresponding author: Yunqi Bu, Departments of Statistics and Biostatistics, University of Washington, Seattle, USA, E-mail:

Acknowledgments

We thank Andrew Zhou for the inspiring discussions, Rosemary Adams for contributions to the visualizations, and Benjamin Phillips for contributions to the R code. We thank Dantao Peng, Yanlei Mu, and Xiao Zhang for providing the fMRI data. We would like to express our appreciation to Min Zhang for the imaging expertise for preprocessing the fMRI data. We also thank Noah Simon, Sam Koelle, Mengjie Pan, Yuxi Wu, Néhémy Lim, and Rui Zhuang for helpful suggestions. We finally thank the reviewers for their insightful feedback, which helped us to improve the paper.

  1. Author contribution: All the authors have accepted responsibility for the entire content of this submitted manuscript and approved submission.

  2. Research funding: None declared.

  3. Conflict of interest statement: The authors declare no conflicts of interest regarding this article.

References

1. Arlot, S, Celisse, A. A survey of cross-validation procedures for model selection. Stat Surv 2010;4:40–79. https://doi.org/10.1214/09-ss054.Search in Google Scholar

2. Banerjee, O, Ghaoui, LE, d’Aspremont, A. Model selection through sparse maximum likelihood estimation for multivariate Gaussian or binary data. J Mach Learn Res 2008;9:485–516.Search in Google Scholar

3. Barabási, AL, Albert, R. Emergence of scaling in random networks. Science 1999;286:509–12. https://doi.org/10.1126/science.286.5439.509.Search in Google Scholar PubMed

4. Besag, J. Spatial interaction and the statistical analysis of lattice systems. J Roy Stat Soc B 1974;36:192–236. https://doi.org/10.1111/j.2517-6161.1974.tb00999.x.Search in Google Scholar

5. Bien, J, Gaynanova, I, Lederer, J, Müller, C. Prediction error bounds for linear regression with the TREX. Test 2019;28:451–74. https://doi.org/10.1007/s11749-018-0584-4.Search in Google Scholar

6. Bühlmann, P, van de Geer, SA. Statistics for high-dimensional data: methods, theory and applications. Heidelberg, Dordrecht, London, New York: Springer Science and Business Media; 2011.10.1007/978-3-642-20192-9Search in Google Scholar

7. Chichignoud, M, Lederer, J, Wainwright, MJ. A practical scheme and fast algorithm to tune the lasso with optimality guarantees. J Mach Learn Res 2016;17:1–20.Search in Google Scholar

8. Dalalyan, A, Hebiri, M, Lederer, J. On the prediction performance of the lasso. Bernoulli 2017;23:552–81. https://doi.org/10.3150/15-bej756.Search in Google Scholar

9. Filkov, V. Identifying gene regulatory networks from gene expression data. Handbook of computational molecular biology. Chapman and Hall/CRC; 2005. p. 27.10.1201/9781420036275.ch27Search in Google Scholar

10. Friedman, J, Hastie, T, Tibshirani, R. Sparse inverse covariance estimation with the graphical lasso. Biostatistics 2008;9:432–41. https://doi.org/10.1093/biostatistics/kxm045.Search in Google Scholar PubMed PubMed Central

11. Friedman, J, Hastie, T, Tibshirani, R. Graphical lasso: estimation of gaussian graphical models, R package “glasso” version 1.11. CRAN; 2019.Search in Google Scholar

12. Friedman, J, Hastie, T, Tibshirani, R. Regularization paths for generalized linear models via coordinate descent. J Stat Software 2010;33:1–22. https://doi.org/10.18637/jss.v033.i01.Search in Google Scholar

13. Giraud, C. Introduction to high-dimensional statistics. Chapman and Hall/CRC; 2014, vol 138.10.1201/b17895Search in Google Scholar

14. Glasser, MF, Coalson, TS, Robinson, EC, Hacker, CD, Harwell, J, Yacoub, E, et al.. A multi-modal parcellation of human cerebral cortex. Nature 2016;536:171–8. https://doi.org/10.1038/nature18933.Search in Google Scholar

15. Gould, RL, Arroyo, B, Brown, RG, Owen, AM, Bullmore, ET, Howard, RJ. Brain mechanisms of successful compensation during learning in Alzheimer disease. Neurology 2006;67:1011–7. https://doi.org/10.1212/01.wnl.0000237534.31734.1b.Search in Google Scholar

16. Grady, CL, Furey, ML, Pietrini, P, Horwitz, B, Rapoport, SI. Altered brain functional connectivity and impaired short-term memory in Alzheimer’s disease. Brain 2001;124:739–56. https://doi.org/10.1093/brain/124.4.739.Search in Google Scholar

17. Granger, CW, Morris, MJ. Time series modeling and interpretation. J Roy Stat Soc A 1976;139:246–57. https://doi.org/10.2307/2345178.Search in Google Scholar

18. Grimmett, GR. A theorem about random fields. Bull Lond Math Soc 1973;5:81–4. https://doi.org/10.1112/blms/5.1.81.Search in Google Scholar

19. Gu, Q, Cao, Y, Ning, Y, Liu, H. Local and global inference for high dimensional Gaussian copula graphical models 2015. arXiv:1502.02347.Search in Google Scholar

20. Hastie, T, Tibshirani, R, Wainwright, M. Statistical learning with sparsity. Chapman and Hall/CRC; 2015.10.1201/b18401Search in Google Scholar

21. Haugh, LD. Checking the independence of two covariance-stationary time series: a univariate residual cross-correlation approach. J Am Stat Assoc 1976;71:378–85. https://doi.org/10.1080/01621459.1976.10480353.Search in Google Scholar

22. Horwitz, B, Grady, CL, Schlageter, NL, Duara, R, Rapoport, SI. Intercorrelations of regional cerebral glucose metabolic rates in Alzheimer’s disease. Brain Res 1987;407:294–306. https://doi.org/10.1016/0006-8993(87)91107-3.Search in Google Scholar

23. Huang, S, Li, J, Sun, L, Ye, J, Fleisher, A, Wu, T, et al., The Alzheimer’s Disease NeuroImaging Initiative. Learning brain connectivity of Alzheimer’s disease by sparse inverse covariance estimation. Neuroimage 2010;50:935–49. https://doi.org/10.1016/j.neuroimage.2009.12.120.Search in Google Scholar PubMed PubMed Central

24. Huettel, SA, Song, AW, McCarthy, G. Functional magnetic resonance imaging. Sunderland: Sinauer Associates; 2009.Search in Google Scholar

25. Janková, J, van de Geer, SA. Confidence intervals for high-dimensional inverse covariance estimation. Electron J Stat 2015;9:1205–29. https://doi.org/10.1214/15-ejs1031.Search in Google Scholar

26. Kaufmann, T, van der Meer, D, Doan, NT, Schwarz, E, Lund, MJ, Agartz, I, et al.. Genetics of brain age suggest an overlap with common brain disorders 2018. bioRxiv, 303164.10.1101/303164Search in Google Scholar

27. Lauritzen, SL. Graphical models. Clarendon: Oxford University Press; 1996.Search in Google Scholar

28. Lederer, J. Graphical models for discrete and continuous data. 2016. arXiv:1609.05551.Search in Google Scholar

29. Lederer, J, Vogt, M. Estimating the lasso’s effective noise. 2020. arXiv:2004.11554.Search in Google Scholar

30. Lederer, J, Yu, L, Gaynanova, I. Oracle inequalities for high-dimensional prediction. Bernoulli 2019;25:1225–55. https://doi.org/10.3150/18-bej1019.Search in Google Scholar

31. Li, W, Lederer, J. Tuning parameter calibration in high-dimensional logistic regression with theoretical guarantees. 2016. arXiv:1610.00207.Search in Google Scholar

32. Lim, C, Yu, B. Estimation stability with cross-validation (ESCV). J Comput Graph Stat 2016;25:464–92. https://doi.org/10.1080/10618600.2015.1020159.Search in Google Scholar

33. Meinshausen, N, Bühlmann, P. High-dimensional graphs and variable selection with the lasso. Ann Stat 2006;34:1436–62. https://doi.org/10.1214/009053606000000281.Search in Google Scholar

34. Park, T, Casella, G. The Bayesian lasso. J Am Stat Assoc 2008;103:681–6. https://doi.org/10.1198/016214508000000337.Search in Google Scholar

35. Sabourin, J, Valdar, W, Nobel, A. A permutation approach for selecting the penalty parameter in penalized model selection. Biometrics 2015;71:1185–94. https://doi.org/10.1111/biom.12359.Search in Google Scholar PubMed PubMed Central

36. Supekar, K, Menon, V, Rubin, D, Musen, M, Greicius, MD. Network analysis of intrinsic functional brain connectivity in Alzheimer’s disease. PLoS Comput Biol 2008;4:e1000100. https://doi.org/10.1371/journal.pcbi.1000100.Search in Google Scholar PubMed PubMed Central

37. Tibshirani, R. Regression shrinkage and selection via the lasso. J Roy Stat Soc B 1996;58:267–88. https://doi.org/10.1111/j.2517-6161.1996.tb02080.x.Search in Google Scholar

38. Tzourio-Mazoyer, N, Landeau, B, Papathanassiou, D, Crivello, F, Etard, O, Delcroix, N, et al.. Automated anatomical labeling of activations in SPM using a macroscopic anatomical parcellation of the MNI MRI single-subject brain. Neuroimage 2002;15:273–89. https://doi.org/10.1006/nimg.2001.0978.Search in Google Scholar PubMed

39. van de Geer, SA, Bühlmann, P. On the conditions used to prove oracle results for the lasso. Electron J Stat 2009;3:1360–92. https://doi.org/10.1214/09-ejs506.Search in Google Scholar

40. van den Heuvel, MP, Pol, HEH. Exploring the brain network: a review on resting-state fMRI functional connectivity. Eur Neuropsychopharmacol 2010;20:519–34. https://doi.org/10.1016/j.euroneuro.2010.03.008.Search in Google Scholar PubMed

41. Wainwright, MJ. Sharp thresholds for high-dimensional and noisy sparsity recovery using ℓ1-constrained quadratic programming (Lasso). IEEE Trans Inf Theor 2009;55:2183–202. https://doi.org/10.1109/tit.2009.2016018.Search in Google Scholar

42. Wang, K, Liang, M, Wang, L, Tian, L, Zhang, X, Li, K, et al.. Altered functional connectivity in early Alzheimer’s disease: a resting-state fMRI study. Hum Brain Mapp 2007;28:967–78. https://doi.org/10.1002/hbm.20324.Search in Google Scholar PubMed PubMed Central

43. Yu, B. Stability. Bernoulli 2013;19:1484–500. https://doi.org/10.3150/13-bejsp14.Search in Google Scholar

44. Yuan, M, Lin, Y. Model selection and estimation in the Gaussian graphical model. Biometrika 2007;94:19–35. https://doi.org/10.1093/biomet/asm018.Search in Google Scholar

45. Zhao, P, Yu, B. On model selection consistency of Lasso. J Mach Learn Res 2006;7:2541–63.Search in Google Scholar

46. Zou, H. The adaptive lasso and its oracle properties. J Am Stat Assoc 2006;101:1418–29. https://doi.org/10.1198/016214506000000735.Search in Google Scholar


Supplementary Material

The online version of this article offers supplementary material (https://doi.org/10.1515/ijb-2020-0133).


Received: 2020-09-14
Revised: 2021-01-10
Accepted: 2021-02-09
Published Online: 2021-03-09

© 2021 Walter de Gruyter GmbH, Berlin/Boston