Jump to ContentJump to Main Navigation
Show Summary Details
More options …

Statistical Applications in Genetics and Molecular Biology

Editor-in-Chief: Sanguinetti, Guido

IMPACT FACTOR 2017: 0.812
5-year IMPACT FACTOR: 1.104

CiteScore 2017: 0.86

SCImago Journal Rank (SJR) 2017: 0.456
Source Normalized Impact per Paper (SNIP) 2017: 0.527

Mathematical Citation Quotient (MCQ) 2017: 0.04

See all formats and pricing
More options …
Volume 14, Issue 3


Volume 10 (2011)

Volume 9 (2010)

Volume 6 (2007)

Volume 5 (2006)

Volume 4 (2005)

Volume 2 (2003)

Volume 1 (2002)

Modeling gene-covariate interactions in sparse regression with group structure for genome-wide association studies

Yun Li
  • Corresponding author
  • Department of Mathematics and Statistics, Boston University, MA 02215, USA
  • Department of Biostatistics, Boston University School of Public Health, MA 02118, USA
  • Email
  • Other articles by this author:
  • De Gruyter OnlineGoogle Scholar
/ George T. O’Connor / Josée Dupuis / Eric Kolaczyk
Published Online: 2015-05-01 | DOI: https://doi.org/10.1515/sagmb-2014-0073


In genome-wide association studies (GWAS), it is of interest to identify genetic variants associated with phenotypes. For a given phenotype, the associated genetic variants are usually a sparse subset of all possible variants. Traditional Lasso-type estimation methods can therefore be used to detect important genes. But the relationship between genotypes at one variant and a phenotype may be influenced by other variables, such as sex and life style. Hence it is important to be able to incorporate gene-covariate interactions into the sparse regression model. In addition, because there is biological knowledge on the manner in which genes work together in structured groups, it is desirable to incorporate this information as well. In this paper, we present a novel sparse regression methodology for gene-covariate models in association studies that not only allows such interactions but also considers biological group structure. Simulation results show that our method substantially outperforms another method, in which interaction is considered, but group structure is ignored. Application to data on total plasma immunoglobulin E (IgE) concentrations in the Framingham Heart Study (FHS), using sex and smoking status as covariates, yields several potentially interesting gene-covariate interactions.

Keywords: gene-environment/covariate interaction; genome-wide association studies; sparse regression


  • Bickel, P., Y. Ritov and A. Tsybakov (2009): “Simultaneous analysis of lasso and dantzig selector,” Ann. Stat., 37, 1705–1732.CrossrefWeb of ScienceGoogle Scholar

  • Candes, E. and T. Tao (2007): “The dantzig selector: Statistical estimation when p is much larger than n (with discussion),” Ann. Stat., 35, 2313–2351.CrossrefGoogle Scholar

  • Chen, G. and D. Thomas (2010): “Using biological knowledge to discover higher order interactions in genetic association studies,” Genet. Epidemiol., 34, 863–878.CrossrefPubMedGoogle Scholar

  • Chipman, H. (1996): “Bayesian variable selection with related predictors,” Can. J. Stat., 24, 17–36.CrossrefGoogle Scholar

  • Choi, N., W. Li and J. Zhu (2010): “Variable selection with the strong heredity constraint and its oracle property,” J. Am. Stat. Assoc., 105, 354–364.Web of ScienceCrossrefGoogle Scholar

  • Fan, J. and R. Li (2001): “Variable selection via nonconcave penalized likelihood and its oracle properties,” J. Am. Stat. Assoc., 96, 1348–1360.CrossrefGoogle Scholar

  • Fan, J. and J. Lv (2008): “Sure independence screening for ultra-high dimensional feature space,” J. R. Stat. Soc., Series B, 70, 849–911.Google Scholar

  • Friedman, J., T. Hastie and R. Tibshirani (2010a): “A note on the group lasso and sparse group lasso,” arXiv:1001.0736v1. (http://arxiv.org/pdf/1001.0736v1.pdf).

  • Friedman, J., T. Hastie and R. Tibshirani (2010b): “Regularization paths for generalized linear models via coordinate descent,” J. Stat. Software, 33, 1–22.Google Scholar

  • Fu, W. (1998): “Penalized regression: the bridge versus the lasso,” J. Comput. Graph. Stat., 7, 397–416.Google Scholar

  • Gauderman, W., C. Murcray, F. Gilliland and D. Conti (2007): “Testing association between disease and multiple SNPs in a candidate gene,” Genet. Epidemiol., 31, 383–395.CrossrefGoogle Scholar

  • Granada, M., J. Wilk, M. Tuzova, D. Strachan, S. Weiding, E. Albrecht, C. Gieger, J. Heinrish, B. Himes, G. Hunninghake, J. Celedn, S. Weiss, W. Cruikshank, L. Farrer, D. Center and G. O’Connor (2012): “A genome-wide association study of plasma total IgE concentration in the Framingham Heart Study,” J. Allergy Clin. Immun., 129, 840–845.Web of ScienceCrossrefGoogle Scholar

  • Hamada, M. and C. Wu (1992): “Analysis of designed experiments with complex aliasing,” J. Qual. Technol., 24, 130–137.Google Scholar

  • Huang, J., S. Ma, H. Xie and C. Zhang (2009): “A group bridge approach for variable selection,” Biometrika, 96, 339–355.PubMedCrossrefWeb of ScienceGoogle Scholar

  • Joseph, V. (2006): “A Bayesian approach to the design and analysis of fractionated experiments,” Technometrics, 48, 219–229.CrossrefGoogle Scholar

  • Li, Y. and G. Abecasis (2006): “Mach 1.0: rapid haplotype reconstruction and missing genotype inference,” Am. J. Hum. Genet. S., 79, 2290.Google Scholar

  • McCullagh, P. and J. Nelder (1989): Generalized linear models, London: Chapman & Hall/CRC.Google Scholar

  • Meinshausen, N. (2007): “Relaxed lasso,” Comput. Stat. Data Anal., 52, 374–393.Web of ScienceCrossrefGoogle Scholar

  • Nardi, Y. and A. Rinaldo (2008): “On the asymptotic properties of the group lasso estimator for linear models,” Electron. J. Stat., 2, 605–633.Web of ScienceCrossrefGoogle Scholar

  • Nelder, J. (1994): “The statistics of linear models: Back to basics,” Stat. Comput., 4, 221–234.CrossrefGoogle Scholar

  • Radchenko, P. and G. James (2010): “Variable selection using adaptive nonlinear interaction structures in high dimensions,” J. Am. Stat. Assoc., 105, 1541–1553.Web of ScienceCrossrefGoogle Scholar

  • Simon, N., J. Friedman, T. Hastie and R. Tibshirani (2013): “A sparse-group lasso,” J. Comput. Graph. Stat., 22.2, 231–245.CrossrefGoogle Scholar

  • The ENCODE Project Consortium (2012): “An integrated encyclopedia of DNA elements in the human genome,” Nature, 489, 57–74.Web of ScienceGoogle Scholar

  • Tibshirani, R. (1996): “Regression shrinkage and selection via the lasso,” J. R. Stat. Soc., Series B, 58, 267–288.Web of ScienceGoogle Scholar

  • Wu, T., Y. Chen, T. Hastie, E. Sobel and K. Lange (2009): “Genomewide association analysis by lasso penalized logistic regression,” Bioinformatics, 25, 714–721.CrossrefWeb of ScienceGoogle Scholar

  • Yuan, M. and Y. Lin (2006): “Model selection and estimation in regression with grouped variables,” J. R. Stat. Soc., Series B, 68, 4967.Google Scholar

  • Zhao, R., G. Rocha and B. Yu (2009): “The composite absolute penalties family for grouped and hierarchical variable selection,” The Annals of Stat., 6A, 3468–3497.Web of ScienceGoogle Scholar

  • Zhou, N. and J. Zhu (2010): “Group variable selection via a hierarchical lasso and its oracle property,” Stat. Interface, 3, 574.Google Scholar

  • Zou, H. (2006): “The adaptive lasso and its oracle properties,” J. Am. Stat. Assoc., 101, 1418–1429.CrossrefGoogle Scholar

  • Zou, H. and T. Hastie (2005): “Regularization and variable selection via the elastic net,” J. R. Stat. Soc., Series B, 67, 301–320.Google Scholar

About the article

Corresponding author: Yun Li, Department of Mathematics and Statistics, Boston University, MA 02215, USA; and Department of Biostatistics, Boston University School of Public Health, MA 02118, USA, e-mail:

Published Online: 2015-05-01

Published in Print: 2015-06-01

Citation Information: Statistical Applications in Genetics and Molecular Biology, Volume 14, Issue 3, Pages 265–277, ISSN (Online) 1544-6115, ISSN (Print) 2194-6302, DOI: https://doi.org/10.1515/sagmb-2014-0073.

Export Citation

©2015 by De Gruyter.Get Permission

Comments (0)

Please log in or register to comment.
Log in