Jump to ContentJump to Main Navigation
Show Summary Details
More options …

Statistical Applications in Genetics and Molecular Biology

Editor-in-Chief: Stumpf, Michael P.H.

6 Issues per year

IMPACT FACTOR 2016: 0.646
5-year IMPACT FACTOR: 1.191

CiteScore 2016: 0.94

SCImago Journal Rank (SJR) 2016: 0.625
Source Normalized Impact per Paper (SNIP) 2016: 0.596

Mathematical Citation Quotient (MCQ) 2016: 0.06

See all formats and pricing
More options …
Volume 12, Issue 5


Volume 17 (2018)

Volume 10 (2011)

Volume 9 (2010)

Volume 6 (2007)

Volume 5 (2006)

Volume 4 (2005)

Volume 2 (2003)

Volume 1 (2002)

Identifying clusters in genomics data by recursive partitioning

Gro Nilsen
  • Biomedical Informatics, Department of Informatics, University of Oslo, Norway; and Centre for Cancer Biomedicine, University of Oslo, Norway
  • Other articles by this author:
  • De Gruyter OnlineGoogle Scholar
/ Ørnulf Borgan / Knut LiestØl
  • Biomedical Informatics, Department of Informatics, University of Oslo, Norway; and Centre for Cancer Biomedicine, University of Oslo, Norway
  • Other articles by this author:
  • De Gruyter OnlineGoogle Scholar
/ Ole Christian Lingjærde
  • Corresponding author
  • Biomedical Informatics, Department of Informatics, University of Oslo, Norway; and Centre for Cancer Biomedicine, University of Oslo, Norway
  • K.G. Jebsen Centre for Breast Cancer Research, Oslo University Hospital, Oslo, Norway
  • Email
  • Other articles by this author:
  • De Gruyter OnlineGoogle Scholar
Published Online: 2013-08-13 | DOI: https://doi.org/10.1515/sagmb-2013-0016


Genomics studies frequently involve clustering of molecular data to identify groups, but common clustering methods such as K-means clustering and hierarchical clustering do not determine the number of clusters. Methods for estimating the number of clusters typically focus on identifying the global structure in the data, however the discovery of substructures within clusters may also be of great biological interest. We propose a novel method, Partitioning Algorithm based on Recursive Thresholding (PART), that recursively uncovers distinct subgroups in the groups already identified. Outliers are common in high-dimensional genomics data and may mask the presence of substructure within a cluster. A crucial feature of the algorithm is the introduction of tentative splits of clusters to isolate outliers that might otherwise halt the recursion prematurely. The method is demonstrated on simulated as well as a wide range of real data sets from gene expression microarrays, where the correct clusters were known in advance. When subclusters are present and the variance is large or varies between the clusters, the proposed method performs better than two established global methods on simulated data. On the real data sets the overall performance of PART is superior to the global methods when used in combination with hierarchical clustering. The method is implemented in the R package clusterGenomics and is freely available from CRAN (The Comprehensive R Archive Network).

This article offers supplementary material which is provided at the end of the article.

Keyword:: cluster analysis; gene expression; genomics; recursion; subclusters


  • Alizadeh, A., M. Eisen, R. Davis, C. Ma, I. Lossos, A. Rosenwald, J. Boldrick, H. Sabet, T. Tran, X. Yu, J. Powell, L. Yang, G. Marti, T. Moore, J. Hudson, L. Lu, D. Lewis, R. Tibshirani, G. Sherlock, W. Chan, T. Greiner, D. Weisenburger, J. Armitage, R. Warnke, R. Levy, W. Wilson, M. Grever, J. Byrd, D. Botstein, P. Brown and L. Staudt (2000): “Distinct types of diffuse large b-cell lymphoma identified by gene expression profiling,” Nature, 403, 503–511.Google Scholar

  • Calinski, T. and J. Harabasz (1974): “A dendrite method for cluster analysis,” Commun. Stat., 3, 1–27.Google Scholar

  • de Souto, M., I. Costa, D. de Araujo, T. Ludermir and A. Schliep (2008): “Clustering cancer gene expression data: a comparative study,” BMC Bioinformatics, 9, 497.Web of ScienceGoogle Scholar

  • Dudoit, S. and J. Fridlyand (2002): “A prediction-based resampling method for estimating the number of clusters in a dataset,” Genome Biol., 3, research0036.1–research0036.21.Google Scholar

  • Fowlkes, E. and C. Mallows (1983): “A method for comparing two hierarchical clusterings,” J. Am. Stat. Assoc., 78, 553–569.CrossrefGoogle Scholar

  • Giancarlo, R., D. Scaturro and F. Utro (2008): “Computational cluster validation for microarray data analysis: experimental assessment of clest, consensus clustering, figure of merit, gap statistics and model explorer,” BMC Bioinformatics, 9, 462.Web of SciencePubMedGoogle Scholar

  • Hamerly, G. and C. Elkan (2003): “Learning the k in k-means.” In Neural Information Processing Systems. MIT Press, 2003.Google Scholar

  • Hartigan, J. (1975): Clustering algorithms, New York: John Wiley and Sons.Google Scholar

  • Hubert, L. and P. Arabie (1985): “Comparing partitions,” J. Classif., 2, 193–218.CrossrefGoogle Scholar

  • Kalogeratos, A. and A. Likas (2012): “Dip-means: an incremental clustering method for estimating the number of clusters.” In Proceedings of the 26th Annual Conference on Neural Information Processing Systems, NIPS, MIT Press.Google Scholar

  • Kaufman, L. and P. Rousseeuw (1990): Finding groups in data: An introduction to cluster analysis, New York: John Wiley and Sons.Google Scholar

  • Krzanowski, W. and Y. Lai (1988): “A criterion for determining the number of groups in a data set using sum-of-squares clustering,” Biometrics, 44, 23–34.CrossrefGoogle Scholar

  • Milligan, G. and M. Cooper (1988): “A study of standardization of variables in cluster analysis,” J. Classif., 5, 181–204.CrossrefGoogle Scholar

  • Nilsen, G., K. Liestøl, P. Van Loo, H. K. M. Vollan, M. Eide, O. Rueda, S.-F. Chin, R. Russell, L. Baumbusch, C. Caldas, A.-L. Børresen-Dale and O. C. Lingjærde (2012): “Copynumber: efficient algorithms for single- and multi-track copy number segmentation,” BMC Genomics 13, 591.Web of ScienceCrossrefPubMedGoogle Scholar

  • Peng, Y., Y. Zhang, G. Kou and Y. Shi (2012): “A multicriteria decision making approach for estimating the number of clusters in a data set,” PLoS One, 7, e41713.Google Scholar

  • Perou, C., T. Sørlie, M. Eisen, M. van de Rijn, S. Jeffrey, C. Rees, J. Pollack, D. Ross, H. Johnsen, L. Akslen, O. Fluge, A. Pergamenschikov, C. Williams, S. Zhu, P. Lonning, A. Børresen-Dale, P. Brown and D. Botstein (2000): “Molecular portraits of human breast tumours,” Nature, 406, 747–752.Web of ScienceGoogle Scholar

  • Pollard, K. S. and M. J. van der Laan (2002): “A Method to Identify Significant Clusters in Gene Expression Data,” U.C. Berkeley Division of Biostatistics, Working Paper Series, 107.Google Scholar

  • Schlicker, A., G. Beran, C. Chresta, G. McWalter, A. Pritchard, S. Weston, S. Runswick, S. Davenport, K. Heathcote, D. A. Castro, G. Orphanides, T. French and L. F. Wessels (2012): “Subtypes of primary colorectal tumors correlate with response to targeted treatment in colorectal cell lines,” BMC Med. Genomics, 5, 66.CrossrefPubMedWeb of ScienceGoogle Scholar

  • Sørlie, T., C. Perou, R. Tibshirani, T. Aas, S. Geisler, H. Johnsen, T. Hastie, M. Eisen, M. van de Rijn, S. Jeffrey, T. Thorsen, H. Quist, J. Matese, P. Brown, D. Botstein, P. Lønning and A. Børresen-Dale (2001): “Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications,” Proc Natl Acad Sci USA, 98, 10869–10874.Google Scholar

  • Tibshirani, R. and G. Walther (2005): “Cluster validation by prediction strength,” J. Comput. Graph. Stat., 14, 511–528.CrossrefGoogle Scholar

  • Tibshirani, R., G. Walther and T. Hastie (2001): “Estimating the number of clusters in a data set via the gap statistic,” J. Roy. Stat. Soc. B, 63, 411–423.Google Scholar

  • van Rijsbergen, C. (1979): Information retrieval, 2nd ed., London: Butterworths.Google Scholar

  • Yan, M. and K. Ye (2007): “Determining the number of clusters using the weighted gap statistic,” Biometrics, 63, 1031–1037.Web of SciencePubMedGoogle Scholar

About the article

Corresponding author: Ole Christian Lingjærde, Biomedical Informatics, Department of Informatics, University of Oslo, Postboks 1080 Blindern, 0316 Oslo, Norway; Centre for Cancer Biomedicine, University of Oslo, Norway; and K.G. Jebsen Centre for Breast Cancer Research, Oslo University Hospital, Oslo, Norway, e-mail:

Published Online: 2013-08-13

Published in Print: 2013-10-01

Citation Information: Statistical Applications in Genetics and Molecular Biology, Volume 12, Issue 5, Pages 637–652, ISSN (Online) 1544-6115, ISSN (Print) 2194-6302, DOI: https://doi.org/10.1515/sagmb-2013-0016.

Export Citation

©2013 by Walter de Gruyter Berlin Boston. Copyright Clearance Center

Citing Articles

Here you can find all Crossref-listed publications in which this article is cited. If you would like to receive automatic email messages as soon as this article is cited in other publications, simply activate the “Citation Alert” on the top of this page.

Miriam Ragle Aure, Sandra Jernström, Marit Krohn, Hans Kristian Moen Vollan, Eldri U Due, Einar Rødland, Rolf Kåresen, Prahlad Ram, Yiling Lu, Gordon B Mills, Kristine Kleivi Sahlberg, Anne-Lise Børresen-Dale, Ole Christian Lingjærde, and Vessela N Kristensen
Genome Medicine, 2015, Volume 7, Number 1
Miriam Ragle Aure, Valeria Vitelli, Sandra Jernström, Surendra Kumar, Marit Krohn, Eldri U. Due, Tonje Husby Haukaas, Suvi-Katri Leivonen, Hans Kristian Moen Vollan, Torben Lüders, Einar Rødland, Charles J. Vaske, Wei Zhao, Elen K. Møller, Silje Nord, Guro F. Giskeødegård, Tone Frost Bathen, Carlos Caldas, Trine Tramm, Jan Alsner, Jens Overgaard, Jürgen Geisler, Ida R. K. Bukholm, Bjørn Naume, Ellen Schlichting, Torill Sauer, Gordon B. Mills, Rolf Kåresen, Gunhild M. Mælandsmo, Ole Christian Lingjærde, Arnoldo Frigessi, Vessela N. Kristensen, Anne-Lise Børresen-Dale, and Kristine K. Sahlberg
Breast Cancer Research, 2017, Volume 19, Number 1
Sándor Csősz, Brian L. Fisher, and Nicolas Chaline
PLOS ONE, 2016, Volume 11, Number 4, Page e0152454
Eli Taraldsrud, Pål Aukrust, Silje Jørgensen, Ole Christian Lingjærde, Johanna Olweus, June H. Myklebust, and Børre Fevang
Clinical Immunology, 2017, Volume 175, Page 69
Lene C. Olsen, Kally C. O’Reilly, Nina B. Liabakk, Menno P. Witter, and Pål Sætrom
Brain Structure and Function, 2017
Gabriele Giorgi, David Dubin, and Javier Fiz Perez
Frontiers in Psychology, 2016, Volume 7

Comments (0)

Please log in or register to comment.
Log in