Jump to ContentJump to Main Navigation
Show Summary Details
More options …

Statistical Applications in Genetics and Molecular Biology

Editor-in-Chief: Sanguinetti, Guido

IMPACT FACTOR 2018: 0.536
5-year IMPACT FACTOR: 0.764

CiteScore 2018: 0.49

SCImago Journal Rank (SJR) 2018: 0.316
Source Normalized Impact per Paper (SNIP) 2018: 0.342

Mathematical Citation Quotient (MCQ) 2018: 0.02

See all formats and pricing
More options …
Volume 12, Issue 3


Volume 10 (2011)

Volume 9 (2010)

Volume 6 (2007)

Volume 5 (2006)

Volume 4 (2005)

Volume 2 (2003)

Volume 1 (2002)

Block-diagonal discriminant analysis and its bias-corrected rules

Herbert Pang
  • Department of Biostatistics and Bioinformatics, Duke University School of Medicine, Durham, NC 27710, USA
  • Other articles by this author:
  • De Gruyter OnlineGoogle Scholar
/ Tiejun Tong / Michael Ng
Published Online: 2013-05-11 | DOI: https://doi.org/10.1515/sagmb-2012-0017


High-throughput expression profiling allows simultaneous measure of tens of thousands of genes at once. These data have motivated the development of reliable biomarkers for disease subtypes identification and diagnosis. Many methods have been developed in the literature for analyzing these data, such as diagonal discriminant analysis, support vector machines, and k-nearest neighbor methods. The diagonal discriminant methods have been shown to perform well for high-dimensional data with small sample sizes. Despite its popularity, the independence assumption is unlikely to be true in practice. Recently, a gene module based linear discriminant analysis strategy has been proposed by utilizing the correlation among genes in discriminant analysis. However, the approach can be underpowered when the samples of the two classes are unbalanced. In this paper, we propose to correct the biases in the discriminant scores of block-diagonal discriminant analysis. In simulation studies, our proposed method outperforms other approaches in various settings. We also illustrate our proposed discriminant analysis method for analyzing microarray data studies.

This article offers supplementary material which is provided at the end of the article.

Keywords: bias-correction; block-diagonal; classification; high-dimensional data; linear discriminant analysis


  • Abramowitz, M. and I. Stegun (1972): Handbook of mathematical functions. New York: Dover.Google Scholar

  • Anderson, T. W. (1958): An Introduction to multivariate analysis. New York: John Wiley.Google Scholar

  • Antoniadis, A., S. Lambert-Lacroix and F. Leblanc (2003): “Effective dimension reduction methods for tumor classification using gene expression data,” Bioinformatics, 19, 563–570.PubMedCrossrefGoogle Scholar

  • Asyali, M. H., D. Colak, O. Demirkaya and M. S. Inan (2006): “Gene expression profile classification: a review,” Curr. Bioinformatics, 1, 55–73.CrossrefGoogle Scholar

  • Bickel, P. J. and E. Levina (2004): “Some theory of Fisher’s linear discriminant function, ‘naive Bayes’, and some alternatives when there are many more variables than observations,” Bernoulli, 10, 989–1010.CrossrefGoogle Scholar

  • Bodenhofer, U., A. Kothmeier and S. Hochreiter (2011): “APCluster: an R package for affinity propagation clustering,” Bioinformatics, 27, 2463–3464.Web of ScienceCrossrefGoogle Scholar

  • Breiman, L. (2001): “Random forests,” Mach. Learn., 45, 5–32.CrossrefGoogle Scholar

  • Cohen, G., M. Hilario, H. Sax, S. Hugonnet and A. Geissbuhler (2006): “Learning from imbalanced data in surveillance of nosocomial infection,” Artif. Intell. Med., 37, 718.Google Scholar

  • Dabney, A. R. and J. D. Storey (2007): “Optimality driven nearest centroid classification from genomic data,” PLoS ONE, 2, e1002.Web of ScienceGoogle Scholar

  • Dai, J., L. Lieu and D. Rocke (2006): “Dimension reduction for classification with gene expression microarray data,” Stat. Appl. Genetics Mol. Biol., 5, 6.Google Scholar

  • Das Gupta, S. (1968): “Some aspects of discrimination function coefficients,” Sankhya, 30, 387–400.Google Scholar

  • Dettling, M. (2004): “Bagboosting for tumor classification with gene expression data,” Bioinformatics, 20, 3583–3593.CrossrefPubMedGoogle Scholar

  • Dudoit, S., J. Fridlyand and T. P. Speed (2002): “Comparison of discrimination methods for the classification of tumors using gene expression data,” J. Am. Stat. Assoc., 97, 77–87.CrossrefGoogle Scholar

  • Frey, B. and D. Dueck (2007): “Clustering by passing messages between data points,” Science, 315, 972–976.CrossrefPubMedWeb of ScienceGoogle Scholar

  • Friedman, J. H. (1989): “Regularized discriminant analysis,” J. Am. Stat. Assoc., 84, 165–175.CrossrefGoogle Scholar

  • Ghurye, S. G. and I. Own (1969): “Unbiased estimation of some multivariate probability densities and related functions,” Ann. Math. Stat., 40, 1261–1271.CrossrefGoogle Scholar

  • Guo, J. (2010): “Simultaneous variable selection and class fusion for high-dimensional linear discriminant analysis,” Biostatistics, 11, 599–608.Web of SciencePubMedCrossrefGoogle Scholar

  • Guo, Y., T. Hastie and R. Tibshirani (2007): “Regularized linear discriminant analysis and its application in microarrays,” Biostatistics, 8, 86–100.PubMedWeb of ScienceCrossrefGoogle Scholar

  • Heilemann, U. and R. Schuhr (2008): “On the evolution of German business cycles 1958–2004,” Jahrbucher fur Nationalokonomie und Statistik, 228, 84–109.Google Scholar

  • Horvath, S. and J. Dong (2008): “Geometric interpretation of gene coexpression network analysis,” PLoS Comput. Biol., 4, e1000117.Web of ScienceCrossrefGoogle Scholar

  • Hu, P., S. Bull and H. Jiang (2011): “Gene network modules-based liner discriminant analysis of microarray gene expression data,” Lect. Notes Comput. Sci., 6674, 286–296.Google Scholar

  • Huang, D. and C. Zheng (2006): “Independent component analysis-based penalized discriminant method for tumor classification using gene expression data,” Bioinformatics, 22, 1855–1862.PubMedCrossrefGoogle Scholar

  • Huang, S., T. Tong and H. Zhao (2010): “Bias-corrected diagonal discriminant rules for high-dimensional classification,” Biometrics, 66, 1096–1106.CrossrefWeb of SciencePubMedGoogle Scholar

  • Hwang, J. T. G., J. Qiu and Z. Zhao (2009): “Empirical Bayes confidence intervals shrinking both means and variances,” J. Roy. Stat. Soc. B, 71, 265–285.CrossrefWeb of ScienceGoogle Scholar

  • Langaas, M., B. H. Lindqvist and E. Ferkingstad (2005): “Estimating the proportion of true null hypotheses, with application to DNA microarray data,” J. Roy. Stat. Soc., B, 67, 555–572.CrossrefGoogle Scholar

  • Lee, J. W., J. B. Lee, M. Park and S. H. Song (2005): “An extensive comparison of recent classification tools applied to microarray data,” Comput. Stat. Data An., 48, 869–885.CrossrefGoogle Scholar

  • Lee, Y. K., Y. Lin and G. Wahba (2004): “Multicategory support vector machines: theory and application to the classification of microarray data and satellite radiance data,” J. Am. Stat. Assoc., 99, 67–81.CrossrefGoogle Scholar

  • McLachlan, G. J. (1992): Discriminant Analysis and Statistical Pattern Recognition. New York: Wiley-Interscience, John Wiley & Sons.Google Scholar

  • Moran, M. A. and B. J. Murphy (1979): “A closer look at two alternative methods of statistical discrimination,” Appl. Stat., 28, 223–232.CrossrefGoogle Scholar

  • Natowicz, R., R. Incitti, E. G. Horta, B. Charles, P. Guinot, K. Yan, C. Coutant, F. Andre, L. Pusztai and R. Rouzier (2008): “Prediction of the outcome of preoperative chemotherapy in breast cancer using DNA probes that provide information on both complete and incomplete responses,” BMC Bioinformatics, 9, 149.Web of SciencePubMedCrossrefGoogle Scholar

  • Noushath, S., G. H. Kumar and P. Shivakumara (2006): “Diagonal Fisher linear discriminant analysis for efficient face recognition,” Neurocomputing, 69, 1711–1716.CrossrefWeb of ScienceGoogle Scholar

  • Pang, H., T. Tong and H. Zhao (2009): “Shrinkage-based diagonal discriminant analysis and its applications in high-dimensional data,” Biometrics, 65, 1021–1029.PubMedCrossrefWeb of ScienceGoogle Scholar

  • Qiao, X. and Y. Liu (2009): “Adaptive weighted learning for unbalanced multicategory classification,” Biometrics, 65, 159–168.Web of ScienceCrossrefPubMedGoogle Scholar

  • Shieh, G. S., Y. C. Jiang and Y. S. Shih (2006): “Comparison of support vector machines to other classifiers using gene expression data,” Commun. Stat. Simul. C., 35, 241–256.CrossrefGoogle Scholar

  • Shipp, M. A., K. N. Ross, P. Tamayo, A. P. Weng, J. L. Kutok, R. C. Aguiar, M. Gaasenbeek, M. Angelo, M. Reich, G. S. Pinkus, T. S. Ray, M. A. Koval, K. W. Last, A. Norton, T. A. Lister, J. Mesirov, D. S. Neuberg, E. S. Lander, J. C. Aster and T. R. Golub (2002): “Diffuse large b-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning,” Nat. Med., 8, 68–74.PubMedCrossrefGoogle Scholar

  • Son, B. and Y. Lee (2006): “The fusion of two user-friendly biometric modalities: iris and face,” IEICE T. Inf. Syst., e89- d, 372–376.Google Scholar

  • Speed, R. (2003): Statistical analysis of gene expression microarray data. London: Chapman and Hall.Google Scholar

  • Statnikov, A., L. Wang and C. F. Aliferis (2008): “A comprehensive comparison of random forests and support vector machines for microarray-based cancer classification,” BMC Bioinformatics, 9, 319.PubMedCrossrefWeb of ScienceGoogle Scholar

  • Storey, J. D. and R Tibshirani (2001): Estimating the positive false discovery rate under dependence, with applications to DNA microarrays. Technical Report 2001–28, Department of Statistics, Stanford University.Google Scholar

  • Suthram, S., J. Dudley, A. Chiang, R. Chen, T. Hastie and A. Butte (2010): “Network-based elucidation of human disease similarities reveals common functional modules enriched for pluripotent drug targets,” PLoS Comput. Biol., 6, e1000662.Web of ScienceCrossrefGoogle Scholar

  • Taylor I, R. Linding, D. Warde-Farley, Y. Liu, C. Pesquita, D. Faria, S. Bull, T. Pawson, Q. Morris and J. Wrana (2009): “Dynamic modularity in protein interaction networks predicts breast cancer outcome,” Nat. Biotechnol., 27, 199–204.PubMedWeb of ScienceCrossrefGoogle Scholar

  • Tibshirani, R., T. Hastie, B. Narasimhan and G. Chu (2002): “Diagnosis of multiple cancer types by shrunken centroids of gene expression,” Proc. Natl. Acad. Sci., 99, 6567–6572.CrossrefGoogle Scholar

  • Tibshirani, R., T. Hastie, B. Narasimhan and G. Chu (2003): “Class prediction by nearest shrunken centroids, with applications to DNA microarrays,” Stat. Sci., 18, 104–117.CrossrefGoogle Scholar

  • Tong, T. and Y. Wang (2007): “Optimal shrinkage estimation of variances with applications to microarray data analysis,” J. Am. Stat. Assoc., 102, 113–122.CrossrefWeb of ScienceGoogle Scholar

  • Vapnik, V. and S. Kotz (2006): Estimation of Dependences Based on Empirical Data. New York: Springer.Google Scholar

  • Wald, P. M. and R. A. Kronmal (1977): “Discriminant functions when covariates are unequal and sample sizes are moderate,” Biometrics, 33, 479–484.Google Scholar

  • Wang, S. and J. Zhu (2007): “Improved centroids estimation for the nearest shrunken centroid classifier,” Bioinformatics, 23, 972–979.Web of SciencePubMedCrossrefGoogle Scholar

  • Wang, S., W. Qiu and R. Zamar (2007): “Clues: a non-parametric clustering method based on local shrinking,” Comput. Stat. Data An., 52, 286–298.Web of ScienceCrossrefGoogle Scholar

  • Wong, H., N. Cvijanovich, G. Allen, R. Lin, N. Anas, K. Meyer, R. Freishtat, M. Monaco, K. Odoms, B. Sakthivel, T. Shanley and Genomics of Pediatric SIRS/Septic Shock Investigators (2009): “Genomic expression profiling across the pediatric systemic inflammatory response syndrome, sepsis, and septic shock spectrum,” Crit. Care Med., 37, 1558–1566.PubMedCrossrefWeb of ScienceGoogle Scholar

  • Ye, J., T. Li, T. Xiong and R. Janardan (2004): “Using uncorrelated discriminant analysis for tissue classification with gene expression data,” IEEE/ACM Trans. Comput. Biol. Bioinform., 1, 181–190.Google Scholar

About the article

Corresponding author: Tiejun Tong, Department of Mathematics, Hong Kong Baptist University, Hong Kong

Published Online: 2013-05-11

Published in Print: 2013-06-01

Citation Information: Statistical Applications in Genetics and Molecular Biology, Volume 12, Issue 3, Pages 347–359, ISSN (Online) 1544-6115, ISSN (Print) 2194-6302, DOI: https://doi.org/10.1515/sagmb-2012-0017.

Export Citation

©2013 by Walter de Gruyter Berlin Boston.Get Permission

Supplementary Article Materials

Comments (0)

Please log in or register to comment.
Log in