Jump to ContentJump to Main Navigation
Show Summary Details
More options …

Statistical Applications in Genetics and Molecular Biology

Editor-in-Chief: Sanguinetti, Guido

IMPACT FACTOR 2018: 0.536
5-year IMPACT FACTOR: 0.764

CiteScore 2018: 0.49

SCImago Journal Rank (SJR) 2018: 0.316
Source Normalized Impact per Paper (SNIP) 2018: 0.342

Mathematical Citation Quotient (MCQ) 2018: 0.02

See all formats and pricing
More options …
Volume 13, Issue 4


Volume 10 (2011)

Volume 9 (2010)

Volume 6 (2007)

Volume 5 (2006)

Volume 4 (2005)

Volume 2 (2003)

Volume 1 (2002)

Multiclass cancer classification based on gene expression comparison

Sitan Yang
  • Corresponding author
  • Department of Applied Mathematics and Statistics, Johns Hopkins University, Baltimore, Maryland, USA
  • Email
  • Other articles by this author:
  • De Gruyter OnlineGoogle Scholar
/ Daniel Q. Naiman
  • Department of Applied Mathematics and Statistics, Johns Hopkins University, Baltimore, Maryland, USA
  • Other articles by this author:
  • De Gruyter OnlineGoogle Scholar
Published Online: 2014-06-11 | DOI: https://doi.org/10.1515/sagmb-2013-0053


As the complexity and heterogeneity of cancer is being increasingly appreciated through genomic analyses, microarray-based cancer classification comprising multiple discriminatory molecular markers is an emerging trend. Such multiclass classification problems pose new methodological and computational challenges for developing novel and effective statistical approaches. In this paper, we introduce a new approach for classifying multiple disease states associated with cancer based on gene expression profiles. Our method focuses on detecting small sets of genes in which the relative comparison of their expression values leads to class discrimination. For an m-class problem, the classification rule typically depends on a small number of m-gene sets, which provide transparent decision boundaries and allow for potential biological interpretations. We first test our approach on seven common gene expression datasets and compare it with popular classification methods including support vector machines and random forests. We then consider an extremely large cohort of leukemia cancer patients to further assess its effectiveness. In both experiments, our method yields comparable or even better results to benchmark classifiers. In addition, we demonstrate that our approach can integrate pathway analysis of gene expression to provide accurate and biological meaningful classification.

Keywords: biomarker discovery; gene expression analysis; multiclass cancer classification


  • Armstrong, S. A., J. E. Staunton, L. B. Silverman, R. Pieters, M. L. den Boer, M. D. Minden, S. E. Sallan, E. S. Lander, T. R. Golub and S. J. Korsmeyer (2002): “Mll translocations specify a distinct gene expression profile that distinguishes a unique leukemia,” Nat. Gene., 30(1), 41–47.CrossrefGoogle Scholar

  • Beer, D. G., S. L. Kardia, C. C. Huang, T. J. Giordano, A. M. Levin, D. E. Misek, L. Lin, G. Chen, T. G. Gharib, D. G. Thomas, M. L. Lizyness, R. Kuick, S. Hayasaka, J. M. Taylor, M. D. Iannettoni, M. B. Orringer and S. Hanash (2002): “Gene-expression profiles predict survival of patients with lung adenocarcinoma,” Nat. Med., 8(8), 816–824.PubMedGoogle Scholar

  • Breiman, L. (2001): “Random forests,” Mach. Learn., 45(1), 5–32.CrossrefGoogle Scholar

  • Burgess, D. J. (2011): “Cancer genetics: initially complex, always heterogeneous,” Nat. Rev. Cancer, 11, 153.Web of SciencePubMedCrossrefGoogle Scholar

  • Cheok, M. H., W. Yang, C. H. Pui, J. R. Downing, C. Cheng, C. W. Naeve, M. V. Relling and W. E. Evans (2003): “Treatment-specific changes in gene expression discriminate in vivo drug response in human leukemia cells,” Nat. Genet., 34(1), 85–90.PubMedCrossrefGoogle Scholar

  • Chih-Chung, C. and L. Chih-Jen (2011): “Libsvm: a library for support vector machines,” ACM Transact. Intell. Syst. Technol., 2(3), article 27.Google Scholar

  • Cortes, C. and V. Vapnik (1995): “Support-vector networks,” Mach. Learn., 20(3), 273–297.CrossrefGoogle Scholar

  • Dehan, E., A. Ben-Dor, W. Liao, D. Lipson, H. Frimer, S. Rienstein, D. Simansky, M. Krupsky, P. Yaron, E. Friedman, G. Rechavi, M. Perlman, A. Aviram-Goldring, S. Izraeli, M. Bittner, Z. Yakhini and N. Kaminski (2007): “Chromosomal aberrations and gene expression profiles in non-small cell lung cancer,” Lung Cancer, 56(2), 157–184.Web of ScienceGoogle Scholar

  • Dyrskjot, L., T. Thykjaer, M. Kruhoffer, J. L. Jensen, N. Marcussen, D. S. Hamilton, H. Wolf and T. F. Orntoft (2003): “Identifying distinct classes of bladder carcinoma using microarrays,” Nat. Genet., 33(1), 90–96.PubMedGoogle Scholar

  • Dyrskjot, L., K. Zieger, F. X. Real, N. Malats, A. Carrato, C. Hurst, S. Kotwal, M. Knowles, P. U. Malmstrom, M. de la Torre, K. Wester, Y. Allory, D. Vordos, A. Caillault, F. Radvanyi, A. M. Hein, J. L. Jensen, K. M. Jensen, N. Marcussen and T. F. Orntoft (2007): “Gene expression signatures predict outcome in non-muscle-invasive bladder carcinoma: a multicenter validation study,” Clin. Cancer Res., 13(12), 3545–3551.CrossrefGoogle Scholar

  • Eddy, J. A., J. Sung, D. Geman and N. D. Price (2010): “Relative expression analysis for molecular cancer diagnosis and prognosis,” Technol. Cancer Res. Treat., 9(2), 149–159.Web of SciencePubMedCrossrefGoogle Scholar

  • Edelman, L. B., G. Toia, D. Geman, W. Zhang and N. D. Price (2009): “Two-transcript gene expression classifiers in the diagnosis and prognosis of human diseases,” BMC Genom., 10, 583.CrossrefGoogle Scholar

  • Efron, B. and R. Tibshirani (2006): “On testing the significance of sets of genes,” Technical Report, Stanford University, http://www-stat.stanford.edu/~tibs/GSA/.

  • Gatza, M. L., J. E. Lucas, W. T. Barry, J. W. Kim, Q. Wang, M. D. Crawford, M. B. Datto, M. Kelley, B. Mathey-Prevot, A. Potti and J. R. Nevins (2010): “A pathway-based classification of human breast cancer,” Proc. Natl. Acad. Sci. USA, 107(15), 6994–6999.CrossrefGoogle Scholar

  • Geman, D., C. d’ Avignon, D. Q. Naiman and R. L. Winslow (2004): “Classifying gene expression profiles from pairwise mrna comparisons,” Stat. Appl. Genet. Mol. Biol., 3, article 19.Google Scholar

  • Gentleman, R. C., V. J. Carey, D. M. Bates and others (2004): “Bioconductor: Open software development for computational biology and bioinformatics,” Genome Biol., 5, R80.CrossrefGoogle Scholar

  • Golub, T., D. K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J. Mesirov, H. Coller, M. L. Loh, J. R. Downing, M. A. Caligiuri, C. D. Bloomfield and E. S. Lander (1999): “Molecular classification of cancer: class discovery and class prediction by gene expression monitoring,” Science, 286(5439), 531–537.Google Scholar

  • Grate, L. R. (2005): “Many accurate small-discriminatory feature subsets exist in microarray transcript data: biomarker discovery,” BMC Bioinformatics, 6, 97.PubMedCrossrefGoogle Scholar

  • Haferlach, T., A. Kohlmann, L. Wieczorek, G. Basso, G. T. Kronnie, M. C. Bene, J. De Vos, J. M. Hernmandez, W. K. Hofmann, K. I. Mills, A. Gilkes, S. Chiaretti, S. A. Shurtleff, T. J. Kipps, L. Z. Rassenti, A. E. Yeoh, P. R. Papenhausen, W. M. Liu, P. M. Williams and R. Foa (2010): “Clinical utility of microarray-based gene expression profiling in the diagnosis and subclassification of leukemia: report from the international microarray innovations in leukemia study group,” J. Clin. Oncol., 28(15), 2529–2537.CrossrefWeb of ScienceGoogle Scholar

  • Haibe-Kains, B., C. Desmedt, S. Loi, A. C. Culhane, G. Bontempi, J. Quackenbush and C. Sotiriou (2012): “A three-gene model to robustly identify breast cancer molecular subtypes,” J. Natl. Cancer Inst., 104(4), 311–325.CrossrefWeb of ScienceGoogle Scholar

  • Irizarry, R., B. Hobbs, F. Collin, Y. D. Beazer-Barclay, K. J. Antonellis, U. Scherf and T. Speed (2003): “Exploration, normalization, and summaries of high density oligonucleotide array probe level data,” Biostatistics, 4(2), 249–264.PubMedWeb of ScienceCrossrefGoogle Scholar

  • Kanehisa, M., S. Goto, S. Kawashima, Y. Okuno and M. Hattori (2004): “The kegg resource for deciphering the genome,” Nucleic Acids Res., 32, D277–D280.CrossrefGoogle Scholar

  • Kaur, P., D. Schlatzer, K. Cooke and M. R. Chance (2012): “Pairwise protein expression classifier for candidate biomarker discovery for early detection of human disease prognosis,” BMC Bioinformatics, 13, 191.Web of ScienceCrossrefPubMedGoogle Scholar

  • Khan, J., J. S. Wei, M. Ringner, L. H. Saal, M. Ladanyi, F. Westermann, F. Berthold, M. Schwab, C. R. Antonescu, C. Peterson and P. S. Meltzer (2001): “Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks,” Nat. Med., 7(6), 673–679.PubMedCrossrefGoogle Scholar

  • Kim, S., M. Kon and C. DeLisi (2012): “A pathway-based classification of human breast cancer,” Biol. Direct, 7, 21.CrossrefWeb of SciencePubMedGoogle Scholar

  • Leban, G., I. Bratko, U. Petrovic, T. Curk and B. Zupan (2005): “Vizrank: finding informative data projections in functional genomics by machine learning,” Bioinformatics, 21(3), 413–414.PubMedCrossrefGoogle Scholar

  • Lin, X. (2008): Rank-based methods for statistical analysis of gene expression microarray data, Ph.D. Thesis. The Johns Hopkins Unversity, Baltimore, MD, USA.Google Scholar

  • Lin, X., B. Afsari, L. Marchionni, L. Cope, G. Parmigiani, D. Q. Naiman and D. Geman (2009): “The ordering of expression among a few genes can provide simple cancer biomarkers and signal brca1 mutations,” BMC Bioinformatics, 10, 256.Web of ScienceCrossrefGoogle Scholar

  • Mramor, M., G. Leban, J. Demsar and B. Zupan (2007): “Visualization-based cancer microarray data classification analysis,” Bioinformatics, 23(16), 2147–2154.CrossrefPubMedGoogle Scholar

  • Patnaik, S. K., E. Kannisto, S. Knudsen and S. Yendamuri (2010): “Evaluation of microrna expression profiles that may predict recurrence of localized stage i non-small cell lung cancer after surgical resection,” Cancer Res., 70(1), 36–45.CrossrefGoogle Scholar

  • Quackenbush, J. (2006): “Microarray analysis and tumor classification,” New Engl. J. Med., 354(23), 2463–2472.CrossrefGoogle Scholar

  • Shah, M. A., R. Khanin, L. Tang, Y. Y. Janjigian, D. S. Klimstra, H. Gerdes and D. P. Kelsen (2011): “Molecular classification of gastric cancer: a new paradigm,” Clin. Cancer Res., 17(9), 2693–2701.CrossrefPubMedGoogle Scholar

  • Shen, L. and E. C. Tan (2006): “Reducing multiclass cancer classification to binary by output coding and svm,” Comput. Biol. Chem., 30(1), 63–71.CrossrefPubMedGoogle Scholar

  • Statnikov, A., C. F. Aliferis, I. Tsamardinos, D. Hardin and S. Levy (2005): “A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis,” Bioinformatics, 21(5), 631–643.PubMedCrossrefGoogle Scholar

  • Statnikov, A., L. Wang and C. F. Aliferis (2008): “A comprehensive comparison of random forests and support vector machines for microarray-based cancer classification,” BMC Bioinformatics, 9, 319.CrossrefPubMedWeb of ScienceGoogle Scholar

  • Subramanian, A., P. Tamayo, V. K. Mootha, S. Mukherjee, B. L. Ebert, M. A. Gillette, A. Paulovich, S. L. Pomeroy, T. R. Golub, E. S. Lander and J. P. Mesirov (2005): “Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles,” Proc. Natl. Acad. Sci. USA, 102(43), 15545–15550.CrossrefGoogle Scholar

  • Tan, A. C., D. Q. Naiman, L. Xu, R. L. Winslow and D. Geman (2005): “Simple decision rules for classifying human cancers from gene expression profiles,” Bioinformatics, 21(20), 3896–3904.CrossrefPubMedGoogle Scholar

  • Tibshirani, R., T. Hastie, B. Narasimhan and G. Chu (2002): “Diagnosis of multiple cancer types by shrunken centroids of gene expression,” Proc. Natl. Acad. Sci. USA, 99(10), 6567–6572.CrossrefGoogle Scholar

  • Xu, L., D. Geman and R. L. Winslow (2007): “Large-scale integration of cancer microarray data identifies a robust common cancer signature,” BMC Bioinformatics, 8, 275.PubMedWeb of ScienceCrossrefGoogle Scholar

  • Zhao, H., C. J. Logothetis and I. P. Gorlov (2010): “Usefulness of the top-scoring pairs of genes for prediction of prostate cancer progression,” Prostate Cancer Prost. Dis., 13(3), 252–259.Web of ScienceCrossrefGoogle Scholar

About the article

Corresponding author: Sitan Yang, Department of Applied Mathematics and Statistics, Johns Hopkins University, 211C Whitehead Hall 3400 N., Charles Street, Baltimore, Maryland 21218, USA, e-mail:

Published Online: 2014-06-11

Published in Print: 2014-08-01

Citation Information: Statistical Applications in Genetics and Molecular Biology, Volume 13, Issue 4, Pages 477–496, ISSN (Online) 1544-6115, ISSN (Print) 2194-6302, DOI: https://doi.org/10.1515/sagmb-2013-0053.

Export Citation

© 2014 by De Gruyter.Get Permission

Citing Articles

Here you can find all Crossref-listed publications in which this article is cited. If you would like to receive automatic email messages as soon as this article is cited in other publications, simply activate the “Citation Alert” on the top of this page.

Niyaz Yoosuf, José Fernández Navarro, Fredrik Salmén, Patrik L. Ståhl, and Carsten O. Daub
Breast Cancer Research, 2020, Volume 22, Number 1
Behrooz Azarkhalili, Ali Saberi, Hamidreza Chitsaz, and Ali Sharifi-Zarchi
Scientific Reports, 2019, Volume 9, Number 1
Mossa Gardaneh
Multidisciplinary Cancer Investigation, 2017, Volume 1, Number 2, Page 3

Comments (0)

Please log in or register to comment.
Log in