Jump to ContentJump to Main Navigation
Show Summary Details
In This Section

Statistical Applications in Genetics and Molecular Biology

Editor-in-Chief: Stumpf, Michael P.H.

6 Issues per year

IMPACT FACTOR 2016: 0.646
5-year IMPACT FACTOR: 1.191

CiteScore 2016: 0.94

SCImago Journal Rank (SJR) 2015: 0.954
Source Normalized Impact per Paper (SNIP) 2015: 0.554

Mathematical Citation Quotient (MCQ) 2015: 0.06

See all formats and pricing
In This Section
Volume 14, Issue 3 (Jun 2015)


Modeling gene-covariate interactions in sparse regression with group structure for genome-wide association studies

Yun Li
  • Corresponding author
  • Department of Mathematics and Statistics, Boston University, MA 02215, USA
  • Department of Biostatistics, Boston University School of Public Health, MA 02118, USA
  • Email:
/ George T. O’Connor
  • Pulmonary Center, Department of Medicine, Boston University School of Medicine, MA 02118, USA
/ Josée Dupuis
  • Department of Biostatistics, Boston University School of Public Health, MA 02118, USA
/ Eric Kolaczyk
  • Department of Mathematics and Statistics, Boston University, MA 02215, USA
Published Online: 2015-05-01 | DOI: https://doi.org/10.1515/sagmb-2014-0073


In genome-wide association studies (GWAS), it is of interest to identify genetic variants associated with phenotypes. For a given phenotype, the associated genetic variants are usually a sparse subset of all possible variants. Traditional Lasso-type estimation methods can therefore be used to detect important genes. But the relationship between genotypes at one variant and a phenotype may be influenced by other variables, such as sex and life style. Hence it is important to be able to incorporate gene-covariate interactions into the sparse regression model. In addition, because there is biological knowledge on the manner in which genes work together in structured groups, it is desirable to incorporate this information as well. In this paper, we present a novel sparse regression methodology for gene-covariate models in association studies that not only allows such interactions but also considers biological group structure. Simulation results show that our method substantially outperforms another method, in which interaction is considered, but group structure is ignored. Application to data on total plasma immunoglobulin E (IgE) concentrations in the Framingham Heart Study (FHS), using sex and smoking status as covariates, yields several potentially interesting gene-covariate interactions.

Keywords: gene-environment/covariate interaction; genome-wide association studies; sparse regression


  • Bickel, P., Y. Ritov and A. Tsybakov (2009): “Simultaneous analysis of lasso and dantzig selector,” Ann. Stat., 37, 1705–1732. [Crossref] [Web of Science]

  • Candes, E. and T. Tao (2007): “The dantzig selector: Statistical estimation when p is much larger than n (with discussion),” Ann. Stat., 35, 2313–2351. [Crossref]

  • Chen, G. and D. Thomas (2010): “Using biological knowledge to discover higher order interactions in genetic association studies,” Genet. Epidemiol., 34, 863–878. [Crossref] [PubMed]

  • Chipman, H. (1996): “Bayesian variable selection with related predictors,” Can. J. Stat., 24, 17–36. [Crossref]

  • Choi, N., W. Li and J. Zhu (2010): “Variable selection with the strong heredity constraint and its oracle property,” J. Am. Stat. Assoc., 105, 354–364. [Web of Science] [Crossref]

  • Fan, J. and R. Li (2001): “Variable selection via nonconcave penalized likelihood and its oracle properties,” J. Am. Stat. Assoc., 96, 1348–1360. [Crossref]

  • Fan, J. and J. Lv (2008): “Sure independence screening for ultra-high dimensional feature space,” J. R. Stat. Soc., Series B, 70, 849–911.

  • Friedman, J., T. Hastie and R. Tibshirani (2010a): “A note on the group lasso and sparse group lasso,” arXiv:1001.0736v1. (http://arxiv.org/pdf/1001.0736v1.pdf).

  • Friedman, J., T. Hastie and R. Tibshirani (2010b): “Regularization paths for generalized linear models via coordinate descent,” J. Stat. Software, 33, 1–22.

  • Fu, W. (1998): “Penalized regression: the bridge versus the lasso,” J. Comput. Graph. Stat., 7, 397–416.

  • Gauderman, W., C. Murcray, F. Gilliland and D. Conti (2007): “Testing association between disease and multiple SNPs in a candidate gene,” Genet. Epidemiol., 31, 383–395. [Crossref]

  • Granada, M., J. Wilk, M. Tuzova, D. Strachan, S. Weiding, E. Albrecht, C. Gieger, J. Heinrish, B. Himes, G. Hunninghake, J. Celedn, S. Weiss, W. Cruikshank, L. Farrer, D. Center and G. O’Connor (2012): “A genome-wide association study of plasma total IgE concentration in the Framingham Heart Study,” J. Allergy Clin. Immun., 129, 840–845. [Web of Science] [Crossref]

  • Hamada, M. and C. Wu (1992): “Analysis of designed experiments with complex aliasing,” J. Qual. Technol., 24, 130–137.

  • Huang, J., S. Ma, H. Xie and C. Zhang (2009): “A group bridge approach for variable selection,” Biometrika, 96, 339–355. [PubMed] [Crossref] [Web of Science]

  • Joseph, V. (2006): “A Bayesian approach to the design and analysis of fractionated experiments,” Technometrics, 48, 219–229. [Crossref]

  • Li, Y. and G. Abecasis (2006): “Mach 1.0: rapid haplotype reconstruction and missing genotype inference,” Am. J. Hum. Genet. S., 79, 2290.

  • McCullagh, P. and J. Nelder (1989): Generalized linear models, London: Chapman & Hall/CRC.

  • Meinshausen, N. (2007): “Relaxed lasso,” Comput. Stat. Data Anal., 52, 374–393. [Web of Science] [Crossref]

  • Nardi, Y. and A. Rinaldo (2008): “On the asymptotic properties of the group lasso estimator for linear models,” Electron. J. Stat., 2, 605–633. [Web of Science] [Crossref]

  • Nelder, J. (1994): “The statistics of linear models: Back to basics,” Stat. Comput., 4, 221–234. [Crossref]

  • Radchenko, P. and G. James (2010): “Variable selection using adaptive nonlinear interaction structures in high dimensions,” J. Am. Stat. Assoc., 105, 1541–1553. [Web of Science] [Crossref]

  • Simon, N., J. Friedman, T. Hastie and R. Tibshirani (2013): “A sparse-group lasso,” J. Comput. Graph. Stat., 22.2, 231–245. [Crossref]

  • The ENCODE Project Consortium (2012): “An integrated encyclopedia of DNA elements in the human genome,” Nature, 489, 57–74. [Web of Science]

  • Tibshirani, R. (1996): “Regression shrinkage and selection via the lasso,” J. R. Stat. Soc., Series B, 58, 267–288. [Web of Science]

  • Wu, T., Y. Chen, T. Hastie, E. Sobel and K. Lange (2009): “Genomewide association analysis by lasso penalized logistic regression,” Bioinformatics, 25, 714–721. [Crossref] [Web of Science]

  • Yuan, M. and Y. Lin (2006): “Model selection and estimation in regression with grouped variables,” J. R. Stat. Soc., Series B, 68, 4967.

  • Zhao, R., G. Rocha and B. Yu (2009): “The composite absolute penalties family for grouped and hierarchical variable selection,” The Annals of Stat., 6A, 3468–3497. [Web of Science]

  • Zhou, N. and J. Zhu (2010): “Group variable selection via a hierarchical lasso and its oracle property,” Stat. Interface, 3, 574.

  • Zou, H. (2006): “The adaptive lasso and its oracle properties,” J. Am. Stat. Assoc., 101, 1418–1429. [Crossref]

  • Zou, H. and T. Hastie (2005): “Regularization and variable selection via the elastic net,” J. R. Stat. Soc., Series B, 67, 301–320.

About the article

Corresponding author: Yun Li, Department of Mathematics and Statistics, Boston University, MA 02215, USA; and Department of Biostatistics, Boston University School of Public Health, MA 02118, USA, e-mail:

Published Online: 2015-05-01

Published in Print: 2015-06-01

Citation Information: Statistical Applications in Genetics and Molecular Biology, ISSN (Online) 1544-6115, ISSN (Print) 2194-6302, DOI: https://doi.org/10.1515/sagmb-2014-0073. Export Citation

Comments (0)

Please log in or register to comment.
Log in