High-dimensional data are becoming more common in scientific research including gene expression study, financial engineering and signal processing. One significant feature of such data is that the dimension is larger than the sample size , the so-called “large small ” data. For example, gene microarray often measures thousands of gene expression values simultaneously for each individual. However, due to the cost or the limited availability of patients, the number of samples in microarray experiments is usually much smaller than the number of genes. It is common to see microarray data with less than 10 samples [1, 2, 3, 45]. As seen in the literature, there are many statistical and computational challenges in analyzing the “large small ” data.
Let , , be independent and identically distributed (i.i.d.) random vectors from the multivariate normal distribution , where is a -dimensional mean vector and is a covariance matrix of size . When is larger than , the sample covariance matrix is a singular matrix. To overcome the singularity problem, various methods for estimating have been proposed in the recent literature, e.g., the ridge-type estimators in  and , the sparse estimators in [8, 910] and . Recently,  and  considered sparse covariance matrix estimation for time series data based on certain dependence measures, which relaxes the independence assumption among samples. For more references, see also [14, 15] and .
Apart from the covariance matrix estimation, there are situations where one needs an estimate of the determinant (or the log-determinant) of the covariance matrix for high-dimensional data. To illustrate it, we write the log-likelihood function of the data as
where denotes the determinant of the covariance matrix . In classic multivariate analysis, the determinant , referred to as the generalized variance (GV), was introduced by  and  as a scalar measure of overall multidimensional scatter. It has many applications such as outlier detection, hypothesis testing, and classification. To cater for this demand, we present several examples as follows.
Quadratic discriminant analysis (QDA) is an important method of classification. Assuming that the data in class follows , the quadratic discriminant scores are given by
where is the new sample, is the total number of classes, and is the prior probability of observing a sample from class . The classification rule is to assign to class that minimizes among all classes. To implement QDA, it is obvious that we need an estimate of or .
where is the trace, is the norm, and is a tuning parameter. The purpose of the term, , is to ensure that the optimization problem has a unique global positive definite minimizer . Other proposals in this direction include , , ,  and among others.
In probability theory and information theory, the differential entropy is commonly used by extending the concept of entropy to the continuous probability distribution [25, 26]. For a random vector from , the differential entropy is
The minimum covariance determinant (MCD) method developed by  and  is a robust estimator of multivariate scatter. MCD aims to find a subset with samples (observations) having the smallest determinant of the covariance matrix. Specifically, let be the collections of all subsets with samples, where is the cardinality of . For any , let be the corresponding sample covariance. The subset with the minimum determinant is defined as
When is larger than , MCD is ill-defined as is singular. To generalize the MCD method to high-dimensional data, we need an estimate for the determinant of the high-dimensional covariance matrix. For instance,  replaced with , and  modified by shrinking the subset-based sample covariance matrix toward a target matrix.
Multivariate analysis of variance (MANOVA) is a procedure for testing the equality of mean vectors across multiple groups. Wilks’ statistic for the hypothesis test  is given as
where is the within-group sum of squares and cross-product matrix, and is the between-group sum of squares and cross-product matrix. However, is singular under the “large small ” setting. To apply MANOVA for high-dimensional data,  proposed replacing with a shrinkage estimator, in which the shrinkage intensity is computed based on the method by . Ullah and Jones  compared the powers of three types of regularized Wilks’ statistics, in which was replaced by the lasso, ridge and shrinkage estimator, respectively.
From the above examples, it is evident that an estimator of GV, or , plays an important role in high-dimensional data analysis. For ease of notation, we let
throughout the paper. In contrast to the covariance matrix estimation, the investigation of estimating is relatively overlooked in the literature. In practice, one often estimates the covariance matrix first and then uses it to compute the log-determinant. Chiu et al.  considered a regression model and allowed the covariance matrix of response vector to vary with explanatory variables. In specific, they proposed modeling each element of as a linear function of the explanatory variables. One property of the transformation is that the log determinant is equal to , a summation of log eigenvalues of . Recently,  investigated the estimation of under various settings. Under some “moderate” setting with , they proposed to estimate by the determinant of the sample covariance matrix, i.e., . A central limit theorem was also established for in the setting where can grow with . For the “large small ” data, however, they showed that it is impossible to estimate consistently, unless some structural assumption such as sparsity on the parameter can be imposed.
In this paper, we conduct a comprehensive simulation study that evaluates the performance of the existing methods for estimating . We follow a two-step procedure: we first estimate with the existing methods, and then estimate by the plug-in estimator, . In Section 2, we consider a total of eight methods for estimating . A brief review on each of the methods is also given. In Section 3, we conduct simulation studies to evaluate and compare their performance under various settings. In particular, we will consider different types of correlation structures including a non-positive definite covariance matrix that is often ignored in the existing literature. We then explore and summarize some useful findings, and provide some practical guidelines for scientists in Section 4. Finally, we conclude the paper in Section 5 with some discussion. Technical details are provided in the Appendix.
2 Methods for estimating
In this section, we review eight representative methods for estimating the covariance matrix, and then estimate the log-determinant using the eight estimates of , respectively. We also propose a new method for estimating under the assumption of a diagonal covariance matrix. For ease of presentation, we divide the eight methods into four categories: diagonal estimation, shrinkage estimation, sparse estimation, and factor model estimation.
2.1 Diagonal estimation
Method 1: Diagonal Estimator (DE)
Under the “large small ” setting, one naive approach is to estimate by the diagonal sample covariance matrix, i.e., . This estimator was first considered in  to propose a diagonal linear discriminant analysis. It was further considered in  where the authors demonstrated that a diagonal covariance matrix estimation can be sometimes reasonable when is much larger than . Let where are the covariate-specific variances for , and where are the sample variances of , respectively. By letting , we define the first estimator of as (1)
We refer to as the diagonal estimator (DE). To be specific, DE is proposed to estimate rather than .
Method 2: Improved Diagonal Estimator (IDE)
It is noteworthy that DE may not perform well as an estimate of when the sample size is small, mainly due to the unreliable estimates of the sample variances. Various approaches have been proposed to improving the variance estimation in the literature. See, for example, [39, 40, 4142], and .
To improve DE, we consider the optimal shrinkage estimator in ,
where , with , is the Gamma function, and is the shrinkage parameter. Replacing in DE by , we have (2)
where is a constant.
The estimation structure in eq. (2) shows that the DE estimator, , can be further improved. Specifically, if we have such that , then is defined as the optimal value so that the estimator minimizes the mean squared error in the family of estimators .
Let , where are i.i.d random variables with a common chi-squared distribution with degrees of freedom, and , where is the digamma function. Then for any fixed , we have
[(1)] is an unbiased estimator of .
[(2)] Assume also that are i.i.d random variables from a common distribution and . Then
where denotes almost sure convergence.
The proof of Theorem 1 is given in the Appendix. By eq. (2) and Theorem 1, we define the second estimator of as
We refer to as the improved diagonal estimator (IDE).
2.2 Shrinkage estimation
Recall that the sample covariance matrix is singular when the dimension is larger than the sample size. To overcome the singularity problem, other than the diagonal methods in Section 2.1, one may also estimate the covariance matrix by the following convex combination:
where is the target matrix, and is the shrinkage parameter. Both the target matrix and the shrinkage parameter play an important role in the shrinkage estimation. For instance, if we let and , then reduces to the DE estimator.
The appropriate choice of the target matrix has been extensively studied in the literature. See, for example, [6, 33, 44, 45], and  and the references therein. Note that is often chosen to be positive definite and well-conditioned, and consequently, the final estimate is also guaranteed positive definite and well-conditioned for any dimensionality. As suggested in  and , we consider a popular target matrix for nonhomogeneous variances: the “diagonal, unequal variance” matrix, i.e., the diagonal sample covariance matrix .
We also note that, given the target matrix, the estimation of the shrinkage parameter is also crucial to the final estimation. The available estimation methods for the shrinkage parameter are mainly: (1) the unbiased estimation, and (2) the consistent estimation. The unbiased estimation is replacing unknown terms in the optimal value by their unbiased estimators . Whereas, the consistent estimation is replacing the unknown terms in the optimal shrinkage parameter with -consistent estimators . Taken together, we present below the four methods for estimating the covariance matrix and consequently for estimating , respectively.
Method 3: Unbiased Shrinkage Estimator with (USIE)
Letting the target matrix be ,  proposed an unbiased estimator for the shrinkage parameter, denoted by . This leads to . We then define the third estimator of as (3)
Method 4: Consistent Shrinkage Estimator with (CSIE)
Letting the target matrix be ,  proposed a consistent estimator for the shrinkage parameter, denoted by . This leads to . We then define the fourth estimator of as (4)
Method 5: Unbiased Shrinkage Estimator with (USDE)
Letting ,  also proposed an unbiased estimator for the shrinkage parameter, denoted by . This leads to . We then define the fifth estimator of as (5)
Method 6: Consistent Shrinkage Estimator with (CSDE)
Letting ,  also proposed a consistent estimator for the shrinkage parameter, denoted by . This leads to . We then define the sixth estimator of as (6)
2.3 Sparse estimation
When is much larger than , the shrinkage methods in Section 2.2 may not achieve a significant improvement over . In such settings, to have a good estimate of , one may have to impose some structural assumptions such as sparsity in the parameters. Recently,  reviewed some methods on estimating structured high-dimensional covariance and precision matrix. A typical sparsity is to assume that most of the off-diagonal elements in the covariance matrix are zero. To estimate the covariance matrix under a sparsity condition, various thresholding-based methods have been proposed in the literature that aim to locate some “large” off-diagonal elements. See, for example, [8, 9, 46, 47, 48, 49, 5051], and . Particularly, the adaptive thresholding estimator proposed by  achieves the optimal rate of convergence over a large class of sparse covariance matrix under a wide spectral norms. Besides, it can be shown that the adaptive thresholding estimator also attains the optimal convergence rate under Bregman divergence losses over a large parameter class [15, 50]. Therefore, we also consider the sparsity methods as a representative and use them to estimate , i.e., the log-determinant of the covariance matrix.
Method 7: Adaptive Thresholding Estimator (ATE)
Bickel and Levina  proposed a universal thresholding method where all entries in the sample covariance matrix are thresholded by a common value . They required that the variances are uniformly bounded by a constant , and consequently, the variances of the entries of the sample covariance matrix are also uniformly bounded. However, it was shown that a universal thresholding method is suboptimal over a certain class of sparse covariance matrices.
To improve the method above,  proposed an adaptive thresholding estimator for the covariance matrix:
where is the corresponding threshold of , and is a generalized thresholding operator , which is specified as the soft thresholding throughout simulations. With the proper , the estimator adaptively achieves the optimal rate of convergence over a large class of sparse covariance matrix under the spectral norm. Now by , the seventh estimator of is (7)
We refer to as the adaptive thresholding estimator (ATE).
2.4 Factor model estimation
The sparsity condition on the covariance matrix assumes that most of covariates are uncorrelated to each other. Note that, however, this assumption may not be realistic in practice. Recently, under the assumption of conditional sparsity,  introduced a principle orthogonal complement thresholding method using the factor model. In this section, we briefly review their method and then apply it to estimate the log-determinant of the covariance matrix.
Method 8: Principal Orthogonal Complement Thresholding Estimator (POET)
Fan et al.  considered the approximate factor model:
where is the observed response, is the loading matrix, is a vector of common factors, and is the error vector. In this model, we can only observe . Let
where is the covariance matrix of . To estimate ,  applied the spectral decomposition on the sample covariance matrix:
where are eigenvalues of , , are the corresponding eigenvectors, and is the principal orthogonal complement. For this decomposition, the first principal components were kept and the thresholding was applied on . Here, the generalized thresholding operator can be used. In addition,  also introduced a method to obtain an estimation of , denoted by . Their final estimator of is (8)
where is the thresholding result of . Now by eq. (8), we define the last estimator of as (9)
We refer to as the principal orthogonal complement thresholding estimator (POET).
3 Simulation studies
In this section, we compare the numerical performance of the aforementioned eight estimators. We consider five different setups. In the first setup, we generate data from the multivariate normal distribution, . In the second setup, we generate data from a mixture distribution where the covariance matrix is highly sparse. In the third setup, we simulate data from the log-normal distribution to assess the robustness of the eight methods under heavy-tailed data. In the forth setup, we consider a special case where the covariance matrix is degenerate and the data are generated from a degenerate multivariate normal distribution. And in the final setup, we use a realistic covariance matrix structure that is obtained from a real data. To compare these methods, we compute the mean squared error (MSE) as below:
where is the repeated times. Throughout the simulations, we take .
3.1 Normal data
where with being i.i.d. from the distribution , and follows a block diagonal structure:
In our simulations, we consider with for . In addition, we set , , or , to represent different levels of dependence, and or , respectively.
Figures 1 and 2 display the log(MSE) of the eight methods for different levels of dependence, dimension and sample size. From these figures, we have the following findings. When the covariates are uncorrelated, IDE gives the best performance under a high dimension (e.g., ). However, if the dimension is not large (e.g., ), and the covariates are uncorrelated or weakly correlated, shrinking the covariance matrix toward an identity matrix leads to a better performance under a small sample size. This is because when the sample size is small, the variances of the entries of the sample covariance matrix are large. Hence, CSIE and USIE stabilize both diagonal and off-diagonal entries and, at the same time, an identity target possesses an explicit structure which in turn requires little data to fit. Consequently, the resulting estimators have a good bias–variance tradeoff. In addition, when the correlation and dimension are both large, imposing additional structure assumptions is necessary. Under this situation, ATE and POET turn out to be the best two methods among the eight methods unless the sample size is relatively small. When the sample size is small, the pattern of ATE is very similar to that of DE. When the sample size and dimension are both large, ATE outperforms all other methods except for POET.
Figure 3 displays the performance of the eight methods for different levels of dependence with . The pattern is consistent with Figure 2. In particular, as the correlation and sample size are large, the performance of POET is satisfactory. From Figures 1 and 2, however, we note that the log(MSE) of POET tends to be oscillating as the sample size increases. This may due to that POET depends on the estimated number of factors . In , the authors used a consistent estimator for and showed that POET is robust to over-estimated number of factors under the spectral norm. Our simulations in Table 1, however, show that the robustness for estimating the covariance matrix may not hold any more when the purpose is to estimate the determinant. In particular for small sample sizes, either over-estimated or under-estimated leads to a large bias for the determinant estimator.
3.2 Mixture normal data
In this setup, we consider a mixture model where the random vectors are generated from
where and are the density functions of and , respectively. For the covariance matrices, we consider a sparse block diagonal structure as follows:
where with being i.i.d. from the distribution , and being the same as in Setup II. For simplicity, we set and . Under this setting, the covariance matrix of is simplified as , which results in a highly sparse matrix where the odd off-diagonal parts in diagonal blocks are zeros. We set or , and , , or .
Figures 4 and 5 display the log(MSE) of the eight methods under different levels of dependence and sample size. When the sample size is large and the covariates are uncorrelated, IDE gives the best performance. When the sample size is small and the dimension is not large (e.g., ), shrinking the covariance matrix toward an identity matrix (e.g., USIE and CSIE) outperforms the other methods except that the correlation is very large (e.g., ). However, as the sample size and dimension are both large, the shrinkage methods will become suboptimal. Instead,if the correlation is also large (e.g., ), ATE and POET outperform the other methods in most settings. As aforementioned, the performance of POET is not stable and may not be satisfactory when the sample size is not large.
3.3 Heavy-tailed data
In this setup, we consider to simulate heavy-tailed data from a log-normal distribution, , where the mean and variance are and , respectively. First of all, we generate independent random vectors , where all the components of are sampled independently from . Let with , and is a positive definite matrix. Consequently, the mean vector and covariance matrix of are and , respectively. For the covariance matrix, we consider the block diagonal structure as described in Section 3.1. We set or , and , , or .
Figures 6 and 7 display the log(MSE) of the eight methods under different levels of dependence and sample size. When the dimension and correlation are both small, USIE and CSIE outperform the other methods. The reason is similar as the discussion in Section 3.1, the heavy-tailed data may lead to unstable estimates of the entries of , hence shrinking towards a simple identity target, which requires little data to fit, stabilizes the sample covariance matrix. In addition, as shown in Figure 7, when the dimension is large and the correlation is not small, ATE and POET are the only two methods that have a better performance than the other methods except that the sample size is small. Finally, we also note that IDE cannot provide a satisfactory performance even if the covariates are uncorrelated. As demonstrated in Theorem 1, IDE estimator is derived under the normal distribution and may not be robust to heavy-tailed data.
3.4 Degenerate normal data
To further investigate the performance of the eight methods, we consider a non-positive definite covariance matrix in which the positive definite assumption of the covariance matrix is violated. Note that this new setting is often overlooked in the literature. To construct a non-positive definite covariance matrix, we define the affine transformation as
We then apply the affine transformation to the covariance matrix in Setup II and form
It is obvious that since . We set , and , , or . Note that he log-determinant of is negative infinity. Hence, for this degenerate setting, the MSE is defined on the determinant rather than on the log-determinant. Specifically, it is
Figure 8 shows the log(MSE) of all eight methods for different levels of dependence and sample size. We can see that the simulation results are different from those in the previous three setups. POET gives the best performance among the eight methods. In addition, we note that, under the non-positive definite setting, POET performs extremely well when the sample size is very small. For this phenomenon, we explore the possible reasons in the next paragraph.
To estimate ,  applied the spectral decomposition on the sample covariance matrix:
If the sample size is much smaller than the dimension , most eigenvalues of are zeros. This leads to , the principal orthogonal complement of the largest eigenvalues, is nearly a zero matrix. And consequently, the final estimator of POET, , tends to be highly degenerate for small sample sizes rather than for large sample sizes.
Finally, it is noteworthy that when the correlation is strong, the log(MSE) of POET is also fluctuant as the sample size increases. This again verifies that both the correlation and sample size have a large impact on the performance of POET.
3.5 Real data
In this setup, we consider to generate a realistic covariance matrix from the Myeloma data , which is a real microarray data set including a total of 54, 675 genes, with 351 samples in the first group and 208 samples in the second group. To generate the covariance matrix, we first select genes randomly from the first group and then compute the sample covariance matrix using the selected genes, denoted by . Next, to evaluate the performance of the estimators under different levels of dependence, we follow  and define the true covariance matrix as
where controls the level of dependence. We set , , or . Note that corresponds to a diagonal covariance matrix, and treats the generated sample covariance matrix as the true covariance matrix.
Figure 9 shows the log(MSE) of the eight methods for different levels of dependence and sample size. The comparison results are summarized as follows. When the sample size and correlation are both small, the methods that shrinking the covariance matrix toward the identity matrix (e.g., USIE and CSIE) lead to a good performance. When the covariates are uncorrelated and the sample size is large, IDE has the best performance. In addition, when the sample size is large and the correlation is moderate (e.g., and ), shrinking the sample covariance matrix toward a diagonal target matrix (e.g., USDE and CSDE) has a good performance. When the correlation and sample size are both large, ATE outperforms or is at least comparable to USDE and CSDE. Finally, POET is not stable and very sensitive to both the correlation and the sample size. When the correlation and sample size is not large, POET may fail to provide a satisfactory performance owing to the largely increased bias compared with the other methods.
In this section, we summarize some useful findings of the comparison results and also provide some practical guidelines for researchers.
The diagonal estimator, DE, is the simplest method for estimating the determinant of high-dimensional covariance matrix. It assumes that all covariates are uncorrelated. For independent normal data, IDE is an unbiased estimator of and also provides the best performance, especially when the dimension is large. For such settings, IDE can be recommended for estimating the determinant of high-dimensional covariance matrix. In addition, we note that IDE is not robust and may lead to an unsatisfactory performance when the independent normal assumption is violated.
For the shrinkage estimation, different choices of the target matrix and shrinkage parameter result in different performance for the determinant estimation. In general, when the dimension is not large (e.g., ), the shrinkage towards an identity target matrix (e.g., CSIE and USIE) performs well under the small sample size and weak correlation. This pattern is more evident for the heavy-tailed data. With a diagonal target matrix, CSDE, the consistent estimator of , has a similar performance with USDE. However, CSDE and USDE are seldom the best methods especially when the sample size is not large.
For the shrinkage estimators, the optimal shrinkage intensity can be specified without any further turning parameters. Consequently, the time consuming procedures such as cross-validation or bootstrap can be avoided. Table 2 shows the computational time of the eight methods. As we can see, the shrinkage methods are much faster than ATE and POET. More importantly, if the sample size is very small as , , selecting the turning parameters in ATE and POET by cross-validation may result in a large bias. Under this situation, the shrinkage estimators (e.g., shrinkage towards an explicit target matrix) can be very attractive. Nevertheless, as the sample size increases or the correlation is strong, the performance of the shrinkage methods may not be as competitive as the sparse method and the factor model method.
ATE presents its robust property in our settings. Specifically, when the sample size is not very small, ATE performs better or comparably to the other seven methods under various data structures and different levels of dependence. In practice, if the sample size is not very small and we have no prior information about the dependence level of the covariates, the sparse estimator can be recommended for estimating the determinant of high-dimensional covariance matrix.
As shown in the simulations, when the sample size is very small, the performance of ATE is not attractive as the shrinkage estimators or even the diagonal estimators. For possible reasons, we note that an adaptive thresholding parameter in ATE is needed in practice. When the sample size is very small, however, their proposed cross-validation method may not provide a reliable estimate for the optimal threshold value.
Factor model estimation
The factor model estimation, POET, is very attractive for strongly correlated data sets when the sample size is not small.  assumed that the data are weakly correlated after extracting the common factors which can result in high levels of dependence among the covariates. This implies that POET may provide a good performance if the data are strongly correlated. Note also that POET can select automatically if the true covariance matrix is sparse. Then consequently, their method will degenerate to the sparse estimation such as the hard thresholding estimation in  or ATE in .
POET, however, depends on the number of factors , which is unknown in practice. To investigate the impact of the factors under different sample sizes and different levels of dependence, we simulated the MSE of POET for the log-determinant of the covariance under Setup II. Results from Table 1 show that has a large impact on the determinant estimation. When the correlation is strong, , a consistent estimator of , usually leads to a large MSE.  demonstrated that POET is robust to over-estimated and sensitive to under-estimated factors. For the finite sample size, they suggested to chose a relatively large (e.g., not less than 8). However, our simulation studies showed that the robustness for estimating the covariance matrix may not hold any more for estimating the determinant. In particular, for small sample size, both under-estimated and over-estimated factors will give a bad performance of POET. In view of this, we believe that future research is needed for selecting the optimal when the factor model method is applied to estimate the determinant of the covariance matrix.
To conclude, the sample size, the dependence level and the dimension of the data sets take a great impact on the accuracy of estimation. In practice, we may need to select an appropriate estimation method according to the sample size and the prior information on the correlation structure of the covariates. When such prior information is not available, we recommend to use ATE  to estimate the determinant of high-dimensional covariance matrix, which is robust to various correlations and data structures.
In this paper, we have compared a total of eight methods for estimating the log-determinant of the high-dimensional covariance matrix. The performance of the eight methods depends on the sample size, the dependence structure and the dimension of the data. When the sample size is not small, we note that ATE  is always able to provide an average or above average performance among the eight methods. Hence, if there is little prior information about the structure of the covariance matrix, we recommend to use ATE to estimate the log-determinant , or GV, in practice. In terms of computational time, the shrinkage methods are more convenient than ATE and POET because the latter two methods need to select the penalty parameters via cross-validation.
Note that the log-determinant of a covariance matrix is a scalar, the two-step procedure may not provide the best estimation for . One possible future direction is to consider circumventing the full covariance matrix estimation, and estimating the log-determinant directly. Note that , which is essentially a summation of the log-eigenvalues of . This suggests that the random matrix theory or the spectrum analysis may provide feasible solutions to estimate the log-determinant more accurately. The comparison study in this paper may also serve as a proxy to assess the performance of the covariance matrix estimation. Specifically, from a perspective of the loss function, if we define the loss function as
then the conducted simulations in Section 3 provide essentially a comparison for the eight methods for estimating rather than . Of course, we do not intend to claim that the above loss functions should be consistently recommended. In contrast, for evaluating the covariance matrix estimation, other popular methods are also available in the literature. For instance, by letting as the likelihood function and as the corresponding estimator, we may consider any of the distance between the log-likelihood and the estimated log-likelihood as the criterion to evaluate the performance:
In addition, we can also consider any of the following loss functions:
Further research is needed to investigate which loss function provides the best criterion for evaluating the estimation methods of the covariance matrix.
Finally, it is noteworthy that there is another category of publications in the literature on calculating the log-determinant of the covariance matrix [53, 60, 61, 62, 63, 6465]. We now point out that they are very different from the proposed study in our paper. Specifically, these papers assume that the covariance matrix is known, yet as the dimension is very large, the canonical methods (e.g., the Choleskey decomposition) for computing require a total of operations and may not be feasible in practice. The above papers have proposed more efficient algorithms including the random matrix theory and the spectrum analysis for fast computation of .
A proof of Theorem 1
(1) From , we have . Then, . Further,
This leads to
Hence, is an unbiased estimator of .
(2) For , we have
Since , we have
By the above two results, it yields that
Finally, we have
Tiejun Tong’s research was supported by the National Natural Science Foundation of China grant (No. 11671338), and the Hong Kong Baptist University grants FRG2/15-16/019, FRG2/15-16/038 and FRG1/16-17/018. The authors thank the editor, the associate editor and two reviewers for their constructive comments that have led to a substantial improvement of the paper.
Kaur S, Archer KJ, Devi MG, Kriplani A, Strauss JF, Singh R. Differential gene expression in granulosa cells from polycystic ovary syndrome patients with and without insulin resistance: identification of susceptibility gene sets through network analysis. J Clin Endocrinol Metab 2012;97:E2016–E2021. PubMedGoogle Scholar
Kuster DW, Merkus D, Kremer A, van IJcken WF, de Beer VJ, Verhoeven AJ, et al. Left ventricular remodeling in swine after myocardial infarction: a transcriptional genomics approach. Basic Res Cardiol 2011;106:1269–1281. Google Scholar
Mokry M, Hatzis P, Schuijers J, Lansu N, Ruzius FP, Clevers H, et al. Integrated genome-wide analysis of transcription factor occupancy, RNA polymerase II binding and steady-state RNA levels identify differentially regulated functional gene classes. Nucleic Acids Res 2012;40:148–158. CrossrefPubMedGoogle Scholar
Richard AC, Lyons PA, Peters JE, Biasci D, Flint SM, Lee JC, et al. 2014; Comparison of gene expression microarray data with count-based RNA measurements informs microarray interpretation. BMC Genomics 15:649–659. PubMedCrossrefGoogle Scholar
Schurch NJ, Schofield P, Gierliński M, Cole C, Sherstnev A, Singh V, et al. How many biological replicates are needed in an RNA-seq experiment and which differential expression tool should you use. RNA 2016;22:839–851. CrossrefGoogle Scholar
Fan J, Liao Y, Liu H. An overview of the estimation of large covariance and precision matrices. Econometrics J 2016;19:C1–C32. Google Scholar
Wilks S. Multidimensional statistical scatter. In: Andreson TW, editor. Collected papers: contributions to mathematical statistics. New York: John Wiley & Sons, 1967:597–614. Google Scholar
Banerjee O, El Ghaoui L, d’Aspremont A. Model selection through sparse maximum likelihood estimation for multivariate Gaussian or binary data. J Mach Learn Res 2008;9:485–516. Google Scholar
Bishop CM. Pattern recognition and machine learning. New York: Springer, 2006. Google Scholar
Hastie T, Tibshirani R, Friedman J. The elements of statistical learning. New York: Springer, 2002. Google Scholar
Rousseeuw PJ. Multivariate estimation with high breakdown point. Math Stat Appl 1985;8:283–297. Google Scholar
Boudt K, Rousseeuw P, Vanduffel S, Verdonck T. The minimum regularized covariance determinant estimator, 2017. arXiv preprint arXiv:1701.07086. Google Scholar
Anderson TW. An introduction to multivariate statistical analysis. New York: Wiley, 1984. Google Scholar
Schäfer J, Strimmer K. A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics. Stat Appl Genet Mol. Biol. 2005;4:32. Google Scholar
Cai T, Liang T, Zhou H. Law of log determinant of sample covariance matrix and optimal estimation of differential entropy for high-dimensional Gaussian distributions. J Multivariate Anal 2015;137:161–172. CrossrefGoogle Scholar
Bickel PJ, Levina E. Some theory of Fisher’s linear discriminant function, ‘naive Bayes’, and some alternatives when there are many more variables than observations. Bernoulli 2004;10:989–1010. CrossrefGoogle Scholar
Baldi P, Long AD. A Bayesian framework for the analysis of microarray expression data: regularized t-test and statistical inferences of gene changes. Bioinformatics 2001;17:509–519. CrossrefPubMedGoogle Scholar
Cui X, Hwang JT, Qiu J, Blades NJ, Churchill GA. Improved statistical tests for differential gene expression by shrinking variance components estimates. Biostatistics 2005;6:59–75. CrossrefPubMedGoogle Scholar
Lam C, Fan J. Sparsistency and rates of convergence in large covariance matrix estimation. Ann Stat 2009;37:42–54. Google Scholar
Zhan F, Barlogie B, Arzoumanian V, Huang Y, Williams DR, Hollmig K, et al. Gene-expression signature of benign monoclonal gammopathy evident in multiple myeloma is linked to good prognosis. Blood 2007;109:1692–1700. CrossrefPubMedGoogle Scholar
Boutsidis C, Drineas P, Kambadur P, Kontopoulou E-M, Zouzias A. A randomized algorithm for approximating the log determinant of a symmetric positive definite matrix. Linear Algebra and its Applications 2017, in press. Google Scholar
Fitzsimons J, Cutajar K, Osborne M, Roberts S, Filippone M. Bayesian inference of log determinants, 2017a. arXiv preprint arXiv:1704.01445. Google Scholar
Fitzsimons J, Granziol D, Cutajar K, Osborne M, Filippone M, Roberts S. Entropic trace estimates for log determinants, 2017b. arXiv preprint arXiv:1704.07223. Google Scholar
Han I, Malioutov D, Shin J. Large-scale log-determinant computation through stochastic Chebyshev expansions. In: Proceedings of the 32nd International Conference on Machine Learning, 2015:908–917. Google Scholar
Peng W, Wang H. Large-scale log-determinant computation via weighted l2 polynomial approximation with prior distribution of eigenvalues. In:International conference on high performance computing and applications. Springer, 2015:120–125. Google Scholar