Jump to ContentJump to Main Navigation
Show Summary Details
In This Section

Statistical Applications in Genetics and Molecular Biology

Editor-in-Chief: Stumpf, Michael P.H.

6 Issues per year

IMPACT FACTOR 2016: 0.646
5-year IMPACT FACTOR: 1.191

CiteScore 2016: 0.94

SCImago Journal Rank (SJR) 2015: 0.954
Source Normalized Impact per Paper (SNIP) 2015: 0.554

Mathematical Citation Quotient (MCQ) 2015: 0.06

See all formats and pricing
In This Section
Volume 14, Issue 3 (Jun 2015)


Weighted Kolmogorov Smirnov testing: an alternative for Gene Set Enrichment Analysis

Konstantina Charmpi
  • Université Grenoble Alpes, France
  • Laboratoire Jean Kuntzmann, CNRS UMR5224, Grenoble, France
  • Laboratoire d’Excellence TOUCAN, Toulouse, France
/ Bernard Ycart
  • Corresponding author
  • Université Grenoble Alpes, France
  • Laboratoire Jean Kuntzmann, CNRS UMR5224, Grenoble, France
  • Laboratoire d’Excellence TOUCAN, Toulouse, France
  • Email:
Published Online: 2015-05-30 | DOI: https://doi.org/10.1515/sagmb-2014-0077


Gene Set Enrichment Analysis (GSEA) is a basic tool for genomic data treatment. Its test statistic is based on a cumulated weight function, and its distribution under the null hypothesis is evaluated by Monte-Carlo simulation. Here, it is proposed to subtract to the cumulated weight function its asymptotic expectation, then scale it. Under the null hypothesis, the convergence in distribution of the new test statistic is proved, using the theory of empirical processes. The limiting distribution needs to be computed only once, and can then be used for many different gene sets. This results in large savings in computing time. The test defined in this way has been called Weighted Kolmogorov Smirnov (WKS) test. Using expression data from the GEO repository, tested against the MSig Database C2, a comparison between the classical GSEA test and the new procedure has been conducted. Our conclusion is that, beyond its mathematical and algorithmic advantages, the WKS test could be more informative in many cases, than the classical GSEA test.

Keywords: empirical processes; GSEA; Monte-Carlo simulation; statistical test; weak convergence

AMS Subject Classification: Primary 62F03; Secondary 60F17


  • Acevedo, L. G., M. Bieda, R. Green and P. J. Farnham (2008): “Analysis of the mechanisms mediating tumor-specific changes in gene expression in human liver tumors,” Cancer Res., 68(8), 2641–2651. [PubMed] [Crossref]

  • Arnold, T. B. and J. W. Emerson (2011): “Nonparametric goodness-of-fit tests for discrete null distributions,” R Journal, 3/2, 34–39.

  • Barbie, D. A., P. Tamayo, J. S. Boehm, S. Y. Kim, S. E. Moody, I. F. Dunn, A. C. Schinzel, P. Sandy, E. Meylan, C. Scholl, S. Fröhling, E. M. Chan, M. L. Sos, K. Michel, C. Mermel, S. J. Silver, B. A. Weir, J. H. Reiling, Q. Sheng, P. B. Gupta, R. C. Wadlow, H. Le, S. Hoersch, B. S. Wittner, S. Ramaswamy, D. M. Livingston, D. M. Sabatini, M. Meyerson, R. K. Thomas, E. S. Lander, J. P. Mesirov, D. E. Root, D. G. Gilliland, T. Jacks and W. C. Hahn (2009): “Systematic RNA interference reveals that oncogenic KRAS-driven cancers require TBK1,” Nature, 462(7269), 108–112. [Web of Science]

  • Barretina, J., G. Caponigro, N. Stransky, K. Venkatesan, A. A. Margolin, S. Kim, C. J. Wilson, J. Lehár, G. V. Kryukov, D. Sonkin, A. Reddy, M. Liu, L. Murray, M. F. Berger, J. E. Monahan, P. Morais, J. Meltzer, A. Korejwa, J. Jané-Valbuena, F. A. Mapa, J. Thibault, E. Bric-Furlong, P. Raman, A. Shipway, I. H. Engels, J. Cheng, G. K. Yu, J. Yu, P. Aspesi Jr., M. de Silva, K. Jagtap, M. D. Jones, L. Wang, C. Hatton, E. Palescandolo, S. Gupta, S. Mahan, C. Sougnez, R. C. Onofrio, T. Liefeld, L. MacConaill, W. Winckler, M. Reich, N. Li, J. P. Mesirov, S. B. Gabriel, G. Getz, K. Ardlie, V. Chan, V. E. Myer, B. L. Weber, J. Porter, M. Warmuth, P. Finan, J. L. Harris, M. Meyerson, T. R. Golub, M. P. Morrissey, W. R. Sellers, R. Schlegel and L. A. Garraway (2012): “The cancer cell line encyclopedia enables predictive modelling of anticancer drug sensitivity,” Nature, 483(7391), 603–607. [Web of Science]

  • Benjamini, Y. and D. Yekutieli (2001): “The control of the false discovery rate in multiple testing under dependency,” Ann. Statist., 29(4), 1165–1188.

  • Bild, A. and P. G. Febbo (2005): “Application of a priori established gene sets to discover biologically important differential expression in microarray data,” PNAS 102(43), 15278–15279. [PubMed]

  • Carlson, M. (2012): “org.Hs.eg.db: Genome wide annotation for Human,” R package version 2.8.0.

  • Carlson, M. “hgug4110b.db: Agilent Human 1A (V2) annotation data (chip hgug4110b),” R package version 2.14.0.

  • Dudoit, S. and M. van der Laan (2007): Multiple testing procedures with applications to genomics, New York: Springer.

  • Edgar, R., M. Domrachev and A. E. Lash (2002): “Gene expression omnibus: NCBI gene expression and hybridization array data repository,” Nucleic Acids Res., 30(1), 207–210. [PubMed] [Crossref]

  • Frei, E., C. Visco, Z. Y. Xu-Monette, S. Dirnhofer, K. Dybkær, A. Orazi, G. Bhagat, E. D. Hsi, J. H. van Krieken, M. Ponzoni, R. S. Go, M. A. Piris, M. B. Møller, K. H. Young and A. Tzankov (2013): “Addition of rituximab to chemotherapy overcomes the negative prognostic impact of cyclin E expression in diffuse large B-cell lymphoma,” J. Clin. Pathol., 66(11), 956–961. [Crossref]

  • Goeman, J. J. and P. Bühlmann (2007): “Analyzing gene expression data in terms of gene sets: methodological issues,” Bioinformatics, 23(8), 980–987. [Web of Science] [Crossref] [PubMed]

  • Héritier, S., E. Cantoni, S. Copt and M. P. Victoria-Feser (2009): Robust methods in biostatistics, New York: Wiley.

  • Herschkowitz, J. I., K. Simin, V. J. Weigman, I. Mikaelian, J. Usary, Z. Hu, K. E. Rasmussen, L. P. Jones, S. Assefnia, S. Chandrasekharan, M. G. Backlund, Y. Yin, A. I. Khramtsov, R. Bastein, J. Quackenbush, R. I. Glazer, P. H. Brown, J. E. Green, L. Kopelovich, P. A. Furth, J. P. Palazzo, O. I. Olopade, P. S. Bernard, G. A. Churchill, T. Van Dyke and C. M. Perou (2007): “Identification of conserved gene expression features between murine mammary carcinoma models and human breast tumors,” Genome Biol., 8(5), R76. [Crossref] [Web of Science]

  • Huang, D. W., B. T. Sherman and R. A. Lempicki (2009): “Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists,” Nucleic Acids Res., 37(1), 1–13. [Web of Science] [Crossref]

  • Irizarry, R. A., C. Wang, Y. Zhou and T. P. Speed (2009): “Gene set enrichment analysis made simple,” Stat. Methods Med. Res., 18(6), 565–575. [Crossref] [Web of Science]

  • Kim, S. Y. and D. J. Volsky (2005): “PAGE: parametric analysis of gene set enrichment,” BMC Bioinformatics, 6, 144. [PubMed] [Web of Science] [Crossref]

  • Kosorok, M. R. (2008): Introduction to empirical processes and semiparametric inference, New York: Springer.

  • Marisa, L., A. de Reyniès, A. Duval, J. Selves, M. P. Gaub, L. Vescovo, M. C. Etienne-Grimaldi, R. Schiappa, D. Guenot, M. Ayadi, S. Kirzin, M. Chazal, J. F. Fléjou, D. Benchimol, A. Berger, A. Lagarde, E. Pencreach, F. Piard, D. Elias, Y. Parc, S. Olschwang, G. Milano, P. Laurent-Puig and V. Boige (2013): “Gene expression classification of colon cancer into molecular subtypes: characterization, validation, and prognostic value,” PLoS Med., 10(5), e1001453. [Crossref]

  • Mayerle, J., C. M. den Hoed, C. Schurmann, L. Stolk, G. Homuth, M. J. Peters, L. G. Capelle, K. Zimmermann, F. Rivadeneira, S. Gruska, H. Völzke, A. C. de Vries, U. Völker, A. Teumer, J. B. van Meurs, I. Steinmetz, M. Nauck, F. Ernst, F. U. Weiss, A. Hofman, M. Zenker, H. K. Kroemer, H. Prokisch, A. G. Uitterlinden, M. M. Lerch and E. J. Kuipers (2013): “Identification of genetic loci associated with Helicobacter pylori serologic status,” J. Am. Med. Assoc., 309(18), 1912–1920.

  • Mikheev, A. M., T. Nabekura, A. Kaddoumi, T. K. Bammler, R. Govindarajan, M. F. Hebert and J. D. Unadkat (2008): “Profiling gene expression in human placentae of different gestational ages: an OPRU network and UW SCOR study,” Reprod. Sci., 15(9), 866–877.

  • Mootha, V. K., C. M. Lindgren, K. F. Eriksson, A. Subramanian, S. Sihag, J. Lehar, P. Puigserver, E. Carlsson, M. Ridderstråle, E. Laurila, N. Houstis, M. J. Daly, N. Patterson, J. P. Mesirov, T. R. Golub, P. Tamayo, B. Spiegelman, E. S. Lander, J. N. Hirschhorn, D. Altshuler and L. C. Groop (2003): “PGC-1alpha-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes,” Nat. Genet., 34(3), 267–273.

  • Nam, D. and S. Y. Kim (2008): “Gene-set approach for expression pattern analysis,”Brief. Bioinform., 9(3), 189–197. [Web of Science]

  • Obermoser, G., S. Presnell, K. Domico, H. Xu, Y. Wang, E. Anguiano, L. Thompson-Snipes, R. Ranganathan, B. Zeitner, A. Bjork, D. Anderson, C. Speake, E. Ruchaud, J. Skinner, L. Alsina, M. Sharma, H. Dutartre, A. Cepika, E. Israelsson, P. Nguyen, Q. A. Nguyen, A. C. Harrod, S. M. Zurawski, V. Pascual, H. Ueno, G. T. Nepom, C. Quinn, D. Blankenship, K. Palucka, J. Banchereau and D. Chaussabel (2013): “Systems scale interactive exploration reveals quantitative and qualitative differences in response to influenza and pneumococcal vaccines,” Immunity, 38(4), 831–844. [Web of Science] [PubMed] [Crossref]

  • R Core Team (2013): R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria, URL http://www.R-project.org/, ISBN 3-900051-07-0.

  • Sauer, T. (2013): “Computational solution of stochastic differential equations,” WIREs Comput. Stat., 5(5), 362–371. [Crossref]

  • Seok, J., H. S. Warren, A. G. Cuenca, M. N. Mindrinos, H. V. Baker, W. Xu, D. R. Richards, G. P. McDonald-Smith, H. Gao, L. Hennessy, C. C. Finnerty, C. M. López, S. Honari, E. E. Moore, J. P. Minei, J. Cuschieri, P. E. Bankey, J. L. Johnson, J. Sperry, A. B. Nathens, T. R. Billiar, M. A. West, M. G. Jeschke, M. B. Klein, R. L. Gamelli, N. S. Gibran, B. H. Brownstein, C. Miller-Graziano, S. E. Calvano, P. H. Mason, J. P. Cobb, L. G. Rahme, S. F. Lowry, R. V. Maier, L. L. Moldawer, D. N. Herndon, R. W. Davis, W. Xiao and R. G. Tompkins; Inflammation and Host Response to Injury, Large Scale Collaborative Research Program (2013): “Genomic responses in mouse models poorly mimic human inflammatory diseases,” PNAS, 110(9), 3507–3512. [Crossref]

  • Shorack, G. R. and J. A. Wellner (1986): Empirical processes with applications to statistics, New York: Wiley.

  • Subramanian, A., P. Tamayo, V. K. Mootha, S. Mukherjee, B. L. Ebert, M. A. Gillette, A. Paulovich, S. L. Pomeroy, T. R. Golub, E. S. Lander and J. P. Mesirov (2005): “Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles,” PNAS, 102(43), 15545–15550, URL http://www.pnas.org/content/102/43/15545.full.

  • Subramanian, A., H. Kuehn, J. Gould, P. Tamayo and J. P. Mesirov (2007): “Gsea-P: a desktop application for gene set enrichment analysis,” Bioinformatics, 23(23), 3251–3253. [Crossref] [PubMed]

  • Tarca, A. L., G. Bhatti and R. Romero (2013): “A comparison of gene set analysis methods in terms of sensitivity, prioritization and specificity,” PLoS One, 8(11), e79217.

  • Tsodikov, A., A. Szabo and D. Jones (2002): “Adjustments and measures of differential expression for microarray data,” Bioinformatics, 18(2), 251–260. [PubMed] [Crossref]

  • Westra, H. J., M. J. Peters, T. Esko, H. Yaghootkar, C. Schurmann, J. Kettunen, M. W. Christiansen, B. P. Fairfax, K. Schramm, J. E. Powell, A. Zhernakova, D. V. Zhernakova, J. H. Veldink, L. H. Van den Berg, J. Karjalainen, S. Withoff, A. G. Uitterlinden, A. Hofman, F. Rivadeneira, P. A. 't Hoen, E. Reinmaa, K. Fischer, M. Nelis, L. Milani, D. Melzer, L. Ferrucci, A. B. Singleton, D. G. Hernandez, M. A. Nalls, G. Homuth, M. Nauck, D. Radke, U. Völker, M. Perola, V. Salomaa, J. Brody, A. Suchy-Dicey, S. A. Gharib, D. A. Enquobahrie, T. Lumley, G. W. Montgomery, S. Makino, H. Prokisch, C. Herder, M. Roden, H. Grallert, T. Meitinger, K. Strauch, Y. Li, R. C. Jansen, P. M. Visscher, J. C. Knight, B. M. Psaty, S. Ripatti, A. Teumer, T. M. Frayling, A. Metspalu, J. B. van Meurs and L. Franke (2013): “Systematic identification of trans eQTLs as putative drivers of known disease associations,” Nat. Genet., 45(10), 1238–1243. [Web of Science]

  • Wu, D. and G. K. Smyth (2012): “Camera: a competitive gene set test accounting for inter-gene correlation,” Nucleic Acids Res., 40(17), e133. [Web of Science] [Crossref]

  • Xiao, W., M. N. Mindrinos, J. Seok, J. Cuschieri, A. G. Cuenca, H. Gao, D. L. Hayden, L. Hennessy, E. E. Moore, J. P. Minei, P. E. Bankey, J. L. Johnson, J. Sperry, A. B. Nathens, T. R. Billiar, M. A. West, B. H. Brownstein, P. H. Mason, H. V. Baker, C. C. Finnerty, M. G. Jeschke, M. C. López, M. B. Klein, R. L. Gamelli, N. S. Gibran, B. Arnoldo, W. Xu, Y. Zhang, S. E. Calvano, G. P. McDonald-Smith, D. A. Schoenfeld, J. D. Storey, J. P. Cobb, H. S. Warren, L. L. Moldawer, D. N. Herndon, S. F. Lowry, R. V. Maier, R. W. Davis and R. G. Tompkins; Inflammation and Host Response to Injury Large-Scale Collaborative Research Program (2011): “A genomic storm in critically injured humans,” J. Exp. Med., 208(13), 2581–2590. [Crossref]

  • Ycart, B., F. Pont and J. J. Fournié (2014): “Curbing false discovery rates in interpretation of genome-wide expression profiles,” J. Biomed. Inform., 47, 58–61. [Web of Science]

About the article

Corresponding author: Bernard Ycart, 51 rue des Mathématiques, 38041 GRENOBLE cedex 9, France; Université Grenoble Alpes, France; Laboratoire Jean Kuntzmann, CNRS UMR5224, Grenoble, France; and Laboratoire d’Excellence TOUCAN, Toulouse, France, e-mail:

Published Online: 2015-05-30

Published in Print: 2015-06-01

Citation Information: Statistical Applications in Genetics and Molecular Biology, ISSN (Online) 1544-6115, ISSN (Print) 2194-6302, DOI: https://doi.org/10.1515/sagmb-2014-0077. Export Citation

Comments (0)

Please log in or register to comment.
Log in