Accessible Unlicensed Requires Authentication Published by De Gruyter February 25, 2016

AGGrEGATOr: A Gene-based GEne-Gene interActTiOn test for case-control association studies

Mathieu Emily ORCID logo


Among the large of number of statistical methods that have been proposed to identify gene-gene interactions in case-control genome-wide association studies (GWAS), gene-based methods have recently grown in popularity as they confer advantage in both statistical power and biological interpretation. All of the gene-based methods jointly model the distribution of single nucleotide polymorphisms (SNPs) sets prior to the statistical test, leading to a limited power to detect sums of SNP-SNP signals. In this paper, we instead propose a gene-based method that first performs SNP-SNP interaction tests before aggregating the obtained p-values into a test at the gene level. Our method called AGGrEGATOr is based on a minP procedure that tests the significance of the minimum of a set of p-values. We use simulations to assess the capacity of AGGrEGATOr to correctly control for type-I error. The benefits of our approach in terms of statistical power and robustness to SNPs set characteristics are evaluated in a wide range of disease models by comparing it to previous methods. We also apply our method to detect gene pairs associated to rheumatoid arthritis (RA) on the GSE39428 dataset. We identify 13 potential gene-gene interactions and replicate one gene pair in the Wellcome Trust Case Control Consortium dataset at the level of 5%. We further test 15 gene pairs, previously reported as being statistically associated with RA or Crohn’s disease (CD) or coronary artery disease (CAD), for replication in the Wellcome Trust Case Control Consortium dataset. We show that AGGrEGATOr is the only method able to successfully replicate seven gene pairs.

Corresponding author: Mathieu Emily, Agrocampus Ouest – IRMAR UMR CNRS 6625, 65, rue de Saint Brieuc, 35042 Rennes Cedex, France, e-mail: .


I acknowledge Maud Marchal for reading through the manuscript.


Babron, M.-C., A. Etcheto and M.-H. Dizier (2015): “A new correction for multiple testing in gene-gene interaction studies,” Ann. Hum. Genet., doi: 10.1111/ahg.12113. Search in Google Scholar

Chang, X., R. Yamada, A. Suzuki, T. Sawada, S. Yoshino, S. Tokuhiro and K. Yamamoto (2005): “Localization of peptidylarginine deiminase 4 (padi4) and citrullinated protein in synovial tissue of rheumatoid arthritis,” Rheumatology, 44, 40–50. Search in Google Scholar

Chang, X., Y. Zheng, Q. Yang, L. Wang, J. Pan, Y. Xia, X. Yan and J. Han (2012): “Carbonic anhydrase i (ca1) is involved in the process of bone formation and is susceptible to ankylosing spondylitis,” Arthritis Res. Ther., 14, R176. Search in Google Scholar

Chang, X., B. Xu, L. Wang, Y. Wang, Y. Wang and S. Yan (2013): “Investigating a pathogenic role for txndc5 in tumors,” Int. J. Oncol., 43, 1871–1884. Search in Google Scholar

Cheverud, J. M. (2001): “A simple correction for multiple comparisons in interval mapping genome scans,” Heredity, 87, 52–58. Search in Google Scholar

Conneely, K. N. and M. Boehnke (2007): “So many correlated tests, so little time! Rapid adjustment of p values for multiple correlated tests,” Am. J. Hum. Genet., 81, 1158–1168. Search in Google Scholar

Cordell, H. J. (2002): “Epistasis: what it means, what it doesn’t mean, and statistical methods to detect it in humans,” Hum. Mol. Genet., 11, 2463–2468. Search in Google Scholar

Cordell, H. J. (2009): “Detecting gene-gene interactions that underlie human diseases,” Nat. Rev. Genet., 10, 392–404. Search in Google Scholar

Dong, C., X. Chu, Y. Wang, Y. Wang, L. Jin, T. Shi, W. Huang and Y. Li (2008): “Exploration of gene-gene interaction effects using entropy-based methods,” Eur. J. Hum. Genet., 16, 229–235. Search in Google Scholar

Emily, M. (2012): “Indor: a new statistical procedure to test for snp x snp epistasis in genome-wide association studies,” Stat. Med., 31, 2359–2373. Search in Google Scholar

Emily, M., T. Mailund, J. Hein, L. Schauser and M. H. Schierup (2009): “Using biological networks to search for interacting loci in genome-wide association studies,” Eur. J. Hum. Genet., 17, 1231–1240. Search in Google Scholar

Excoffier, L. and M. Slatkin (1995): “Maximum likelihood estimation of molecular haplotype frequencies in a diploid population,” Mol. Biol. Evol., 12, 921–927. Search in Google Scholar

Galwey, N. W. (2009): “A new measure of the effective number of tests, a practical tool for comparing families of non-independent significance tests,” Genet. Epidemiol., 33, 559–568. Search in Google Scholar

Gao, X., J. Starmer and E. R. Martin (2008): “A multiple testing correction method for genetic association studies using correlated single nucleotide polymorphisms,” Genet. Epidemiol., 32, 361–369. Search in Google Scholar

Genz, A. and F. Bretz (2009): Computation of multivariate normal and T probabilities, 1st ed., New York: Springer-Verlag. Search in Google Scholar

Goodarzi, M. O., Y. V. Louwers, K. D. Taylor, M. R. Jones, J. Cui, S. Kwon, Y.-D. I. Chen, X. Guo, L. Stolk, A. G. Uitterlinden, J. S. Laven and R. Azziz (2011): “Replication of association of a novel insulin receptor gene polymorphism with polycystic ovary syndrome,” Fertil. Steril., 95, 1736–1741. Search in Google Scholar

Han, S., B.-Z. Yang, H. R. Kranzler, X. Liu, H. Zhao, L. A. Farrer, E. Boer-winkle, J. B. Potash and J. Gelernter (2013): “Integrating gwass and human protein interaction networks identifies a gene subnetwork underlying alcohol dependence,” Am. J. Hum. Genet., 93, 1027–1034. Search in Google Scholar

Hendricks, A. E., J. Dupuis, M. W. Logue, R. H. Myers and K. L. Lunetta (2014): “Correction for multiple testing in a gene region,” Eur. J. Hum. Genet., 22, 414–418. Search in Google Scholar

Hill, W. G. and A. Robertson (1968): “Linkage diseqilibrium in finite populations,” Theor. Appl. Genet., 38, 226–231. Search in Google Scholar

Hindorff, L. A., P. Sethupathy, H. A. Junkins, E. M. Ramos, J. P. Mehta, F. S. Collins and T. A. Manolio (2009): “Potential etiologic and functional implications of genome-wide association loci for human diseases and traits,” Proc. Natl. Acad. Sci. USA, 106, 9362–9367. Search in Google Scholar

Howie, B. N., P. Donnelly and J. Marchini (2009): “A flexible and accurate genotype imputation method for the next generation of genome-wide association studies,” PLoS Genet., 5, e1000529. Search in Google Scholar

Huang, H., P. Chanda, A. Alonso, J. S. Bader and D. E. Arking (2011): “Gene-based tests of association,” PLoS Genet., 7, e1002177. Search in Google Scholar

Iwamoto, T., K. Ikari, T. Nakamura, M. Kuwahara, Y. Toyama, T. Tomatsu, S. Mo-mohara and N. Kamatani (2006): “Association between padi4 and rheumatoid arthritis: a meta-analysis,” Rheumatology, 45, 804–807. Search in Google Scholar

Jiang, B., X. Zhang, Y. Zuo and G. Kang (2011): “A powerful truncated tail strength method for testing multiple null hypotheses in one dataset,” J. Theor. Biol., 277, 67–73. Search in Google Scholar

Jorgenson, E. and J. S. Witte (2006): “A gene-centric approach to genome-wide association studies,” Nat. Rev. Genet., 7, 885–891. Search in Google Scholar

Jung, J., J. J. Song and D. Kwon (2009): “Allelic based gene-gene interactions in rheumatoid arthritis,” BMC Proc., S7, S76. Search in Google Scholar

Kang, G., W. Yue, J. Zhang, Y. Cui, Y. Zuo and D. Zhang (2008): “An entropy-based approach for testing genetic epistasis underlying complex diseases,” J. Theor. Biol., 250, 362–374. Search in Google Scholar

Keshava Prasad, T. S., R. Goel, K. Kandasamy, S. Keerthikumar, S. Kumar, S. Mathivanan, D. Telikicherla, R. Raju, B. Shafreen, A. Venugopal, L. Balakrishnan, A. Marimuthu, S. Banerjee, D. S. Somanathan, A. Sebastian, S. Rani, S. Ray, C. J. Harrys Kishore, S. Kanth, M. Ahmed, M. K. Kashyap, R. Mohmood, Y. L. Ramachandra, V. Krishna, B. A. Rahiman, S. Mohan, P. Ranganathan, S. Ramabadran, R. Chaerkady and A. Pandey (2009): “Human protein reference database,” Nuc. Acids Res., 37, D767–D772. Search in Google Scholar

Larson, N. B. and D. J. Schaid (2013): “A kernel regression approach to gene-gene interaction detection for case-control studies,” Genet. Epidemiol., 37, 695–703. Search in Google Scholar

Larson, N. B., G. D. Jenkins, M. C. Larson, R. A. Vierkant, T. A. Sellers, C. M. Phelan, J. M. Schildkraut, R. Sutphen, P. P. D. Pharoah, S. A. Gayther, N. Wentzensen, E. L. Goode and B. L. Fridley (2014): “Kernel canonical correlation analysis for assessing gene-gene interactions and application to ovarian cancer,” Eur. J. Hum. Genet., 22, 126–131. Search in Google Scholar

Lewis, C. M. (2002): “Genetic association studies: design, analysis and interpretation,” Brief. Bioinform., 3, 146–153. Search in Google Scholar

Li, W. and J. Reich (2000): “A complete enumeration and classification of two-locus disease models,” Hum. Hered., 50, 334–349. Search in Google Scholar

Li, J. and L. Ji (2005): “Adjusting multiple testing in multilocus analyses using the eigenvalues of a correlation matrix,” Heredity, 95, 221–227. Search in Google Scholar

Li, J. and Y. Chen (2008): “Generating samples for association studies based on hapmap data,” BMC Bioinformatics, 9, 44. Search in Google Scholar

Li, J., R. Tang, J. Biernacka and M. de Andrade (2009): “Identification of gene-gene interaction using principal components,” BMC Proceedings, 3, S78. Search in Google Scholar

Li, M.-X., H.-S. Gui, J. Kwan and P. Sham (2011): “Gates: a rapid and powerful gene-based association test using extended simes procedure,” Am. J. Hum. Genet., 88, 283–293. Search in Google Scholar

Li, J., D. Huang, M. Guo, X. Liu, C. Wang, Z. Teng, R. Zhang, Y. Jiang, H. Lv and L. Wang (2015): “A gene-based information gain method for detecting gene-gene interactions in case-control studies,” Eur. J. Hum. Genet., 23, 1566–1572. Search in Google Scholar

Liu, J. Z., A. F. Mcrae, D. R. Nyholt, S. E. Medland, N. R. Wray, K. M. Brown, N. K. Hayward, G. W. Montgomery, P. M. Visscher, N. G. Martin and S. Mac-gregor (2010): “A versatile gene-based test for genome-wide association studies,” Am. J. Hum. Genet., 87, 139–145. Search in Google Scholar

Liu, Y., H. Xu, S. Chen, X. Chen, Z. Zhang, Z. Zhu, X. Qin, L. Hu, J. Zhu, G.-P. Zhao and X. Kong (2011): “Genome-wide interaction-based association analysis identified multiple new susceptibility loci for common diseases,” PLoS Genet., 7, e1001338. Search in Google Scholar

Ma, L., A. G. Clark and A. Keinan (2013): “Gene-based testing of interactions in association studies of quantitative traits,” PLoS Genet., 9, e1003321. Search in Google Scholar

Maher, B. (2008): “Personal genomes: the case of the missing heritability,” Nature, 456, 18–21. Search in Google Scholar

Manolio, T. A., F. S. Collins, N. J. Cox, D. B. Goldstein, L. A. Hindorff, D. J. Hunter, M. I. McCarthy, E. M. Ramos, L. R. Cardon, A. Chakravarti, J. H. Cho, A. E. Guttmacher, A. Kong, L. Kruglyak, E. Mardis, C. N. Rotimi, M. Slatkin, D. Valle, A. S. Whittemore, M. Boehnke, A. G. Clark, E. E. Eichler, G. Gibson, J. L. Haines, T. F. C. Mackay, S. A. McCarroll and P. M. Visscher (2009): “Finding the missing heritability of complex diseases,” Nature, 461, 747–753. Search in Google Scholar

Marchini, J., P. Donnelly and L. R. Cardon (2005): “Genome-wide strategies for detecting multiple loci that influence complex diseases,” Nat. Genet., 37, 413–417. Search in Google Scholar

Montana, G. (2005): “Hapsim: a simulation tool for generating haplotype data with pre-specified allele frequencies and ld coefficients,” Bioinformatics, 21, 4309–4311. Search in Google Scholar

Moore, J. H. (2003): “The ubiquitous nature of epistasis in determining susceptibility to common human diseases,” Hum. Hered., 56, 73–82. Search in Google Scholar

Moore, J. and B. White (2007): “Tuning relieff for genome-wide genetic analysis,” Lect. Notes Comput. Sc., 4447, 166–175. Search in Google Scholar

Musameh, M. D., W. Y. S. Wang, C. P. Nelson, C. Llus-Ganella, R. Debiec, I. Subirana, R. Elosua, A. J. Balmforth, S. G. Ball, A. S. Hall, S. Kathiresan, J. R. Thompson, G. Lucas, N. J. Samani and M. Tomaszewski (2015): “Analysis of gene-gene interactions among common variants in candidate cardiovascular genes in coronary artery disease,” PLoS One, 10, e0117684. Search in Google Scholar

Neale, B. M. and P. C. Sham (2004): “The future of association studies: gene-based analysis and replication,” Am. J. Hum. Genet., 75, 353–362. Search in Google Scholar

Neuman, R. J. and J. P. Rice (1992): “Two-locus models of diseases,” Genet. Epidemiol., 9, 347–365. Search in Google Scholar

Nielsen, D. M., M. G. Ehm, D. V. Zaykin and B. S. Weir (2004): “Effect of and three-locus linkage disequilibrium on the power to detect marker/phenotype associations,” Genetics, 168, 1029–1040. Search in Google Scholar

Nyholt, D. R. (2004): “A simple correction for multiple testing for single nucleotide polymorphisms in linkage disequilibrium with each other,” Am. J. Hum. Genet., 74, 765–769. Search in Google Scholar

Peng, Q., J. Zhao and F. Xue (2010): “A gene-based method for detecting gene co-association in a case-control association study,” Eur. J. Hum. Genet., 18, 582–587. Search in Google Scholar

Phillips, P. (2008): “Epistasis, the essential role of gene interactions in the ture and evolution of genetic systems,” Nat. Rev. Genet., 9, 855–867. Search in Google Scholar

Pritchard, J. K. and M. Przeworski (2001): “Linkage disequilibrium in Models and data,” Am. J. Hum. Genet., 69, 1–14. Search in Google Scholar

Purcell, S., B. Neale, K. Todd-Brown, L. Thomas, M. A. R. Ferreira, D. J. Maller, P. Sklar, P. I. W. de Bakker, M. J. Daly and P. C. Sham (2007): “Plink: a toolset for whole-genome association and population-based linkage analysis,” Am. J. Hum. Genet., 81, 559–575. Search in Google Scholar

R Core Team (2013): R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria, URL http://www Search in Google Scholar

Rajapakse, I., M. D. Perlman, P. J. Martin, J. A. Hansen and C. Kooperberg (2012): “Multivariate detection of gene-gene interactions,” Genet. Epidemiol., 36, 622–630. Search in Google Scholar

Ritchie, M. D., L. W. Hahn, N. Roodi, L. R. Bailey, W. D. Dupont, F. F. Parl and J. H. Moore (2001): “Multifactor-dimensionality reduction reveals order interactions among estrogen-metabolism genes in sporadic breast cancer. Am. J. Hum. Genet., 69, 138–147. Search in Google Scholar

Schwarz, D., I. Konig and A. Ziegler (2010): “On safari to random jungle: a implementation of random forests for high dimensional data,” Bioinformatics 26, 1752–1758. Search in Google Scholar

Seaman, S. and B. Mller-Myhsok (2005): “Rapid simulation of p values for product methods and multiple-testing adjustment in association studies,” Am. J. Hum. Genet., 76, 399–408. Search in Google Scholar

The 1000 Genomes Project Consortium, G. (2012): “An integrated map of genetic variation from 1,092 human genomes,” Nature, 491, 56–65. Search in Google Scholar

Ueki, M. and H. J. Cordell (2012): “Improved statistics for genome-wide interaction analysis,” PLoS Genet., 8, e1002625. Search in Google Scholar

Wan, X., C. Yang, Q. Yang, H. Xue, X. Fan, N. L. S. Tang and W. Yu (2010): “Boost: a fast approach to detecting gene-gene interactions in genome-wide case-control studies,” Am. J. Hum. Genet., 87, 325–340. Search in Google Scholar

Weir, B. S. (2008): “Linkage disequilibrium and association mapping,” Annu. Rev. Genom. Hum. G., 9, 129–142. Search in Google Scholar

Wodak, S. J., J. Vlasblom, A. L. Turinsky and S. Pu (2013): “Protein-protein interaction networks: the puzzling riches,” Curr. Opin. Struc. Biol., 23, 941–953. Search in Google Scholar

WTCCC (2007): “Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls,” Nature, 447, 661–678. Search in Google Scholar

Wu, M. C., P. Kraft, M. P. Epstein, D. M. Taylor, S. J. Chanock, D. J. Hunter and X. Lin (2010a): “Powerful snp-set analysis for case-control genome-wide association studies,” Am. J. Hum. Genet., 86, 929–942. Search in Google Scholar

Wu, X., H. Dong, L. Luo, Y. Zhu, G. Peng, J. D. Reveille and M. Xiong (2010b): “A novel statistic for genome-wide interaction analysis,” PLoS Genet., 6, e1001131. Search in Google Scholar

Yuan, Z., Q. Gao, Y. He, X. Zhang, F. Li, J. Zhao and F. Xue (2012): “Detection for gene-gene co-association via kernel canonical correlation analysis,” BMC Genetics, 13, 83. Search in Google Scholar

Zavala-Cerna, M. G., N. G. Gonzalez-Montoya, A. Nava, J. I. Gamez-Nava, M. Moran-Moguel, R. C. Rosales-Gomez, S. A. Gutierrez-Rubio, J. Sanchez-Corona, L. Gonzalez-Lopez, I. P. Davalos-Rodriguez and M. Salazar-Paramo (2013): “Padi4 haplotypes in association with ra mexican patients, a new prospect for antigen modulation,” Clin. Dev. Immunol., 2013. Search in Google Scholar

Zaykin, D., L. A. Zhivotovsky, P. Westfall and B. Weir (2002): “Truncated product method for combining p-values,” Genet. Epidemiol., 22, 170–185. Search in Google Scholar

Zhang, Y. and J. S. Liu (2007): “Bayesian inference of epistatic interactions in case-control studies,” Nat. Genet., 39, 1167–1173. Search in Google Scholar

Zhang, X., X. Yang, Z. Yuan, Y. Liu, F. Li, B. Peng, D. Zhu, J. Zhao and F. Xue (2013): “A plspm-based test statistic for detecting gene-gene co-association in genome-wide association study with case-control design,” PLoS One, 8, e62129. Search in Google Scholar

Zhao, J., L. Jin and M. Xiong (2006): “Test for interaction between two unlinked loci,” Am. J. Hum. Genet., 79, 831–845. Search in Google Scholar

Zheng, Y., L. Wang, W. Zhang, H. Xu, Y. Chang, X., L. Wang, W. Zhang, H. Xu and X. Chang (2012): “Transgenic mice over-expressing carbonic anhydrase I showed aggravated joint inflammation and tissue destruction,” BMC Muscu-loskeletal Disorders, 13, 256. Search in Google Scholar

Zuk, O., E. Hechter, S. R. Sunyaev and E. S. Lander (2012): “The mystery of missing heritability: genetic interactions create phantom heritability,” Proc. Natl. Acad. Sci. USA, 109, 1193–1198. Search in Google Scholar

Supplemental Material:

The online version of this article (DOI: 10.1515/sagmb-2015-0074) offers supplementary material, available to authorized users.

Published Online: 2016-2-25
Published in Print: 2016-4-1

©2016 by De Gruyter