Jump to ContentJump to Main Navigation
Show Summary Details
More options …

Statistical Applications in Genetics and Molecular Biology

Editor-in-Chief: Sanguinetti, Guido

IMPACT FACTOR 2017: 0.812
5-year IMPACT FACTOR: 1.104

CiteScore 2017: 0.86

SCImago Journal Rank (SJR) 2017: 0.456
Source Normalized Impact per Paper (SNIP) 2017: 0.527

Mathematical Citation Quotient (MCQ) 2017: 0.04

See all formats and pricing
More options …
Volume 14, Issue 3


Volume 10 (2011)

Volume 9 (2010)

Volume 6 (2007)

Volume 5 (2006)

Volume 4 (2005)

Volume 2 (2003)

Volume 1 (2002)

Weighted Kolmogorov Smirnov testing: an alternative for Gene Set Enrichment Analysis

Konstantina Charmpi
  • Université Grenoble Alpes, France
  • Laboratoire Jean Kuntzmann, CNRS UMR5224, Grenoble, France
  • Laboratoire d’Excellence TOUCAN, Toulouse, France
  • Other articles by this author:
  • De Gruyter OnlineGoogle Scholar
/ Bernard Ycart
  • Corresponding author
  • Université Grenoble Alpes, France
  • Laboratoire Jean Kuntzmann, CNRS UMR5224, Grenoble, France
  • Laboratoire d’Excellence TOUCAN, Toulouse, France
  • Email
  • Other articles by this author:
  • De Gruyter OnlineGoogle Scholar
Published Online: 2015-05-30 | DOI: https://doi.org/10.1515/sagmb-2014-0077


Gene Set Enrichment Analysis (GSEA) is a basic tool for genomic data treatment. Its test statistic is based on a cumulated weight function, and its distribution under the null hypothesis is evaluated by Monte-Carlo simulation. Here, it is proposed to subtract to the cumulated weight function its asymptotic expectation, then scale it. Under the null hypothesis, the convergence in distribution of the new test statistic is proved, using the theory of empirical processes. The limiting distribution needs to be computed only once, and can then be used for many different gene sets. This results in large savings in computing time. The test defined in this way has been called Weighted Kolmogorov Smirnov (WKS) test. Using expression data from the GEO repository, tested against the MSig Database C2, a comparison between the classical GSEA test and the new procedure has been conducted. Our conclusion is that, beyond its mathematical and algorithmic advantages, the WKS test could be more informative in many cases, than the classical GSEA test.

Keywords: empirical processes; GSEA; Monte-Carlo simulation; statistical test; weak convergence

AMS Subject Classification: Primary 62F03; Secondary 60F17


  • Acevedo, L. G., M. Bieda, R. Green and P. J. Farnham (2008): “Analysis of the mechanisms mediating tumor-specific changes in gene expression in human liver tumors,” Cancer Res., 68(8), 2641–2651.PubMedCrossrefGoogle Scholar

  • Arnold, T. B. and J. W. Emerson (2011): “Nonparametric goodness-of-fit tests for discrete null distributions,” R Journal, 3/2, 34–39.Google Scholar

  • Barbie, D. A., P. Tamayo, J. S. Boehm, S. Y. Kim, S. E. Moody, I. F. Dunn, A. C. Schinzel, P. Sandy, E. Meylan, C. Scholl, S. Fröhling, E. M. Chan, M. L. Sos, K. Michel, C. Mermel, S. J. Silver, B. A. Weir, J. H. Reiling, Q. Sheng, P. B. Gupta, R. C. Wadlow, H. Le, S. Hoersch, B. S. Wittner, S. Ramaswamy, D. M. Livingston, D. M. Sabatini, M. Meyerson, R. K. Thomas, E. S. Lander, J. P. Mesirov, D. E. Root, D. G. Gilliland, T. Jacks and W. C. Hahn (2009): “Systematic RNA interference reveals that oncogenic KRAS-driven cancers require TBK1,” Nature, 462(7269), 108–112.Web of ScienceGoogle Scholar

  • Barretina, J., G. Caponigro, N. Stransky, K. Venkatesan, A. A. Margolin, S. Kim, C. J. Wilson, J. Lehár, G. V. Kryukov, D. Sonkin, A. Reddy, M. Liu, L. Murray, M. F. Berger, J. E. Monahan, P. Morais, J. Meltzer, A. Korejwa, J. Jané-Valbuena, F. A. Mapa, J. Thibault, E. Bric-Furlong, P. Raman, A. Shipway, I. H. Engels, J. Cheng, G. K. Yu, J. Yu, P. Aspesi Jr., M. de Silva, K. Jagtap, M. D. Jones, L. Wang, C. Hatton, E. Palescandolo, S. Gupta, S. Mahan, C. Sougnez, R. C. Onofrio, T. Liefeld, L. MacConaill, W. Winckler, M. Reich, N. Li, J. P. Mesirov, S. B. Gabriel, G. Getz, K. Ardlie, V. Chan, V. E. Myer, B. L. Weber, J. Porter, M. Warmuth, P. Finan, J. L. Harris, M. Meyerson, T. R. Golub, M. P. Morrissey, W. R. Sellers, R. Schlegel and L. A. Garraway (2012): “The cancer cell line encyclopedia enables predictive modelling of anticancer drug sensitivity,” Nature, 483(7391), 603–607.Web of ScienceGoogle Scholar

  • Benjamini, Y. and D. Yekutieli (2001): “The control of the false discovery rate in multiple testing under dependency,” Ann. Statist., 29(4), 1165–1188.Google Scholar

  • Bild, A. and P. G. Febbo (2005): “Application of a priori established gene sets to discover biologically important differential expression in microarray data,” PNAS 102(43), 15278–15279.PubMedGoogle Scholar

  • Carlson, M. (2012): “org.Hs.eg.db: Genome wide annotation for Human,” R package version 2.8.0.Google Scholar

  • Carlson, M. “hgug4110b.db: Agilent Human 1A (V2) annotation data (chip hgug4110b),” R package version 2.14.0.Google Scholar

  • Dudoit, S. and M. van der Laan (2007): Multiple testing procedures with applications to genomics, New York: Springer.Google Scholar

  • Edgar, R., M. Domrachev and A. E. Lash (2002): “Gene expression omnibus: NCBI gene expression and hybridization array data repository,” Nucleic Acids Res., 30(1), 207–210.PubMedCrossrefGoogle Scholar

  • Frei, E., C. Visco, Z. Y. Xu-Monette, S. Dirnhofer, K. Dybkær, A. Orazi, G. Bhagat, E. D. Hsi, J. H. van Krieken, M. Ponzoni, R. S. Go, M. A. Piris, M. B. Møller, K. H. Young and A. Tzankov (2013): “Addition of rituximab to chemotherapy overcomes the negative prognostic impact of cyclin E expression in diffuse large B-cell lymphoma,” J. Clin. Pathol., 66(11), 956–961.CrossrefGoogle Scholar

  • Goeman, J. J. and P. Bühlmann (2007): “Analyzing gene expression data in terms of gene sets: methodological issues,” Bioinformatics, 23(8), 980–987.Web of ScienceCrossrefPubMedGoogle Scholar

  • Héritier, S., E. Cantoni, S. Copt and M. P. Victoria-Feser (2009): Robust methods in biostatistics, New York: Wiley.Google Scholar

  • Herschkowitz, J. I., K. Simin, V. J. Weigman, I. Mikaelian, J. Usary, Z. Hu, K. E. Rasmussen, L. P. Jones, S. Assefnia, S. Chandrasekharan, M. G. Backlund, Y. Yin, A. I. Khramtsov, R. Bastein, J. Quackenbush, R. I. Glazer, P. H. Brown, J. E. Green, L. Kopelovich, P. A. Furth, J. P. Palazzo, O. I. Olopade, P. S. Bernard, G. A. Churchill, T. Van Dyke and C. M. Perou (2007): “Identification of conserved gene expression features between murine mammary carcinoma models and human breast tumors,” Genome Biol., 8(5), R76.CrossrefWeb of ScienceGoogle Scholar

  • Huang, D. W., B. T. Sherman and R. A. Lempicki (2009): “Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists,” Nucleic Acids Res., 37(1), 1–13.Web of ScienceCrossrefGoogle Scholar

  • Irizarry, R. A., C. Wang, Y. Zhou and T. P. Speed (2009): “Gene set enrichment analysis made simple,” Stat. Methods Med. Res., 18(6), 565–575.CrossrefWeb of ScienceGoogle Scholar

  • Kim, S. Y. and D. J. Volsky (2005): “PAGE: parametric analysis of gene set enrichment,” BMC Bioinformatics, 6, 144.PubMedWeb of ScienceCrossrefGoogle Scholar

  • Kosorok, M. R. (2008): Introduction to empirical processes and semiparametric inference, New York: Springer.Google Scholar

  • Marisa, L., A. de Reyniès, A. Duval, J. Selves, M. P. Gaub, L. Vescovo, M. C. Etienne-Grimaldi, R. Schiappa, D. Guenot, M. Ayadi, S. Kirzin, M. Chazal, J. F. Fléjou, D. Benchimol, A. Berger, A. Lagarde, E. Pencreach, F. Piard, D. Elias, Y. Parc, S. Olschwang, G. Milano, P. Laurent-Puig and V. Boige (2013): “Gene expression classification of colon cancer into molecular subtypes: characterization, validation, and prognostic value,” PLoS Med., 10(5), e1001453.CrossrefGoogle Scholar

  • Mayerle, J., C. M. den Hoed, C. Schurmann, L. Stolk, G. Homuth, M. J. Peters, L. G. Capelle, K. Zimmermann, F. Rivadeneira, S. Gruska, H. Völzke, A. C. de Vries, U. Völker, A. Teumer, J. B. van Meurs, I. Steinmetz, M. Nauck, F. Ernst, F. U. Weiss, A. Hofman, M. Zenker, H. K. Kroemer, H. Prokisch, A. G. Uitterlinden, M. M. Lerch and E. J. Kuipers (2013): “Identification of genetic loci associated with Helicobacter pylori serologic status,” J. Am. Med. Assoc., 309(18), 1912–1920.Google Scholar

  • Mikheev, A. M., T. Nabekura, A. Kaddoumi, T. K. Bammler, R. Govindarajan, M. F. Hebert and J. D. Unadkat (2008): “Profiling gene expression in human placentae of different gestational ages: an OPRU network and UW SCOR study,” Reprod. Sci., 15(9), 866–877.Google Scholar

  • Mootha, V. K., C. M. Lindgren, K. F. Eriksson, A. Subramanian, S. Sihag, J. Lehar, P. Puigserver, E. Carlsson, M. Ridderstråle, E. Laurila, N. Houstis, M. J. Daly, N. Patterson, J. P. Mesirov, T. R. Golub, P. Tamayo, B. Spiegelman, E. S. Lander, J. N. Hirschhorn, D. Altshuler and L. C. Groop (2003): “PGC-1alpha-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes,” Nat. Genet., 34(3), 267–273.Google Scholar

  • Nam, D. and S. Y. Kim (2008): “Gene-set approach for expression pattern analysis,”Brief. Bioinform., 9(3), 189–197.Web of ScienceGoogle Scholar

  • Obermoser, G., S. Presnell, K. Domico, H. Xu, Y. Wang, E. Anguiano, L. Thompson-Snipes, R. Ranganathan, B. Zeitner, A. Bjork, D. Anderson, C. Speake, E. Ruchaud, J. Skinner, L. Alsina, M. Sharma, H. Dutartre, A. Cepika, E. Israelsson, P. Nguyen, Q. A. Nguyen, A. C. Harrod, S. M. Zurawski, V. Pascual, H. Ueno, G. T. Nepom, C. Quinn, D. Blankenship, K. Palucka, J. Banchereau and D. Chaussabel (2013): “Systems scale interactive exploration reveals quantitative and qualitative differences in response to influenza and pneumococcal vaccines,” Immunity, 38(4), 831–844.Web of SciencePubMedCrossrefGoogle Scholar

  • R Core Team (2013): R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria, URL http://www.R-project.org/, ISBN 3-900051-07-0.

  • Sauer, T. (2013): “Computational solution of stochastic differential equations,” WIREs Comput. Stat., 5(5), 362–371.CrossrefGoogle Scholar

  • Seok, J., H. S. Warren, A. G. Cuenca, M. N. Mindrinos, H. V. Baker, W. Xu, D. R. Richards, G. P. McDonald-Smith, H. Gao, L. Hennessy, C. C. Finnerty, C. M. López, S. Honari, E. E. Moore, J. P. Minei, J. Cuschieri, P. E. Bankey, J. L. Johnson, J. Sperry, A. B. Nathens, T. R. Billiar, M. A. West, M. G. Jeschke, M. B. Klein, R. L. Gamelli, N. S. Gibran, B. H. Brownstein, C. Miller-Graziano, S. E. Calvano, P. H. Mason, J. P. Cobb, L. G. Rahme, S. F. Lowry, R. V. Maier, L. L. Moldawer, D. N. Herndon, R. W. Davis, W. Xiao and R. G. Tompkins; Inflammation and Host Response to Injury, Large Scale Collaborative Research Program (2013): “Genomic responses in mouse models poorly mimic human inflammatory diseases,” PNAS, 110(9), 3507–3512.CrossrefGoogle Scholar

  • Shorack, G. R. and J. A. Wellner (1986): Empirical processes with applications to statistics, New York: Wiley.Google Scholar

  • Subramanian, A., P. Tamayo, V. K. Mootha, S. Mukherjee, B. L. Ebert, M. A. Gillette, A. Paulovich, S. L. Pomeroy, T. R. Golub, E. S. Lander and J. P. Mesirov (2005): “Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles,” PNAS, 102(43), 15545–15550, URL http://www.pnas.org/content/102/43/15545.full.

  • Subramanian, A., H. Kuehn, J. Gould, P. Tamayo and J. P. Mesirov (2007): “Gsea-P: a desktop application for gene set enrichment analysis,” Bioinformatics, 23(23), 3251–3253.CrossrefPubMedGoogle Scholar

  • Tarca, A. L., G. Bhatti and R. Romero (2013): “A comparison of gene set analysis methods in terms of sensitivity, prioritization and specificity,” PLoS One, 8(11), e79217.Google Scholar

  • Tsodikov, A., A. Szabo and D. Jones (2002): “Adjustments and measures of differential expression for microarray data,” Bioinformatics, 18(2), 251–260.PubMedCrossrefGoogle Scholar

  • Westra, H. J., M. J. Peters, T. Esko, H. Yaghootkar, C. Schurmann, J. Kettunen, M. W. Christiansen, B. P. Fairfax, K. Schramm, J. E. Powell, A. Zhernakova, D. V. Zhernakova, J. H. Veldink, L. H. Van den Berg, J. Karjalainen, S. Withoff, A. G. Uitterlinden, A. Hofman, F. Rivadeneira, P. A. 't Hoen, E. Reinmaa, K. Fischer, M. Nelis, L. Milani, D. Melzer, L. Ferrucci, A. B. Singleton, D. G. Hernandez, M. A. Nalls, G. Homuth, M. Nauck, D. Radke, U. Völker, M. Perola, V. Salomaa, J. Brody, A. Suchy-Dicey, S. A. Gharib, D. A. Enquobahrie, T. Lumley, G. W. Montgomery, S. Makino, H. Prokisch, C. Herder, M. Roden, H. Grallert, T. Meitinger, K. Strauch, Y. Li, R. C. Jansen, P. M. Visscher, J. C. Knight, B. M. Psaty, S. Ripatti, A. Teumer, T. M. Frayling, A. Metspalu, J. B. van Meurs and L. Franke (2013): “Systematic identification of trans eQTLs as putative drivers of known disease associations,” Nat. Genet., 45(10), 1238–1243.Web of ScienceGoogle Scholar

  • Wu, D. and G. K. Smyth (2012): “Camera: a competitive gene set test accounting for inter-gene correlation,” Nucleic Acids Res., 40(17), e133.Web of ScienceCrossrefGoogle Scholar

  • Xiao, W., M. N. Mindrinos, J. Seok, J. Cuschieri, A. G. Cuenca, H. Gao, D. L. Hayden, L. Hennessy, E. E. Moore, J. P. Minei, P. E. Bankey, J. L. Johnson, J. Sperry, A. B. Nathens, T. R. Billiar, M. A. West, B. H. Brownstein, P. H. Mason, H. V. Baker, C. C. Finnerty, M. G. Jeschke, M. C. López, M. B. Klein, R. L. Gamelli, N. S. Gibran, B. Arnoldo, W. Xu, Y. Zhang, S. E. Calvano, G. P. McDonald-Smith, D. A. Schoenfeld, J. D. Storey, J. P. Cobb, H. S. Warren, L. L. Moldawer, D. N. Herndon, S. F. Lowry, R. V. Maier, R. W. Davis and R. G. Tompkins; Inflammation and Host Response to Injury Large-Scale Collaborative Research Program (2011): “A genomic storm in critically injured humans,” J. Exp. Med., 208(13), 2581–2590.CrossrefGoogle Scholar

  • Ycart, B., F. Pont and J. J. Fournié (2014): “Curbing false discovery rates in interpretation of genome-wide expression profiles,” J. Biomed. Inform., 47, 58–61.Web of ScienceGoogle Scholar

About the article

Corresponding author: Bernard Ycart, 51 rue des Mathématiques, 38041 GRENOBLE cedex 9, France; Université Grenoble Alpes, France; Laboratoire Jean Kuntzmann, CNRS UMR5224, Grenoble, France; and Laboratoire d’Excellence TOUCAN, Toulouse, France, e-mail:

Published Online: 2015-05-30

Published in Print: 2015-06-01

Citation Information: Statistical Applications in Genetics and Molecular Biology, Volume 14, Issue 3, Pages 279–293, ISSN (Online) 1544-6115, ISSN (Print) 2194-6302, DOI: https://doi.org/10.1515/sagmb-2014-0077.

Export Citation

©2015 by De Gruyter.Get Permission

Citing Articles

Here you can find all Crossref-listed publications in which this article is cited. If you would like to receive automatic email messages as soon as this article is cited in other publications, simply activate the “Citation Alert” on the top of this page.

Mei Ma, Xiao Liang, Shiqiang Cheng, Lu Zhang, Bolun Cheng, Jian Yang, Miao Ding, Yan Zhao, Ping Li, Yanan Du, Li Liu, Yan Wen, Awen He, QianRui Fan, Xiong Guo, and Feng Zhang
Behavioural Brain Research, 2018, Volume 353, Page 137
Giovanni Fiorito, Jelle Vlaanderen, Silvia Polidoro, John Gulliver, Claudia Galassi, Andrea Ranzi, Vittorio Krogh, Sara Grioni, Claudia Agnoli, Carlotta Sacerdote, Salvatore Panico, Ming-Yi Tsai, Nicole Probst-Hensch, Gerard Hoek, Zdenko Herceg, Roel Vermeulen, Akram Ghantous, Paolo Vineis, and Alessio Naccarati
Environmental and Molecular Mutagenesis, 2017
Nino Kordzakhia, Alexander Novikov, and Bernard Ycart
Statistics and Computing, 2017, Volume 27, Number 6, Page 1513
Marie Tosolini, Christelle Algans, Frédéric Pont, Bernard Ycart, and Jean-Jacques Fournié
OncoImmunology, 2016, Volume 5, Number 7, Page e1188246

Comments (0)

Please log in or register to comment.
Log in