Jump to ContentJump to Main Navigation
Show Summary Details
More options …

Statistical Applications in Genetics and Molecular Biology

Editor-in-Chief: Sanguinetti, Guido


IMPACT FACTOR 2018: 0.536
5-year IMPACT FACTOR: 0.764

CiteScore 2018: 0.49

SCImago Journal Rank (SJR) 2018: 0.316
Source Normalized Impact per Paper (SNIP) 2018: 0.342

Mathematical Citation Quotient (MCQ) 2018: 0.02

Online
ISSN
1544-6115
See all formats and pricing
More options …
Volume 17, Issue 6

Issues

Volume 10 (2011)

Volume 9 (2010)

Volume 6 (2007)

Volume 5 (2006)

Volume 4 (2005)

Volume 2 (2003)

Volume 1 (2002)

False discovery control for penalized variable selections with high-dimensional covariates

Kevin He / Xiang Zhou / Hui Jiang
  • Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA
  • University of Michigan, Center for Computational Medicine and Bioinformatics, Ann Arbor, MI, USA
  • Other articles by this author:
  • De Gruyter OnlineGoogle Scholar
/ Xiaoquan Wen
  • Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA
  • University of Michigan, Center for Computational Medicine and Bioinformatics, Ann Arbor, MI, USA
  • Other articles by this author:
  • De Gruyter OnlineGoogle Scholar
/ Yi Li
  • Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA
  • University of Michigan, Center for Computational Medicine and Bioinformatics, Ann Arbor, MI, USA
  • Other articles by this author:
  • De Gruyter OnlineGoogle Scholar
Published Online: 2018-12-15 | DOI: https://doi.org/10.1515/sagmb-2018-0038

Abstract

Modern bio-technologies have produced a vast amount of high-throughput data with the number of predictors much exceeding the sample size. Penalized variable selection has emerged as a powerful and efficient dimension reduction tool. However, control of false discoveries (i.e. inclusion of irrelevant variables) for penalized high-dimensional variable selection presents serious challenges. To effectively control the fraction of false discoveries for penalized variable selections, we propose a false discovery controlling procedure. The proposed method is general and flexible, and can work with a broad class of variable selection algorithms, not only for linear regressions, but also for generalized linear models and survival analysis.

This article offers supplementary material which is provided at the end of the article.

Keywords: dimension reduction; false discovery; penalized regression; variable selection

References

  • Ayers, K. and H. Cordell (2010): “SNP selection in genome-wide and candidate gene studies via penalized logistic regression,” Genet. Epidemiol., 34, 879–891.Web of SciencePubMedCrossrefGoogle Scholar

  • Barber, R. and E. Candês (2015): “Controlling the false discovery rate via knockoffs,” Ann. Stat., 43, 2055–2085.Web of ScienceCrossrefGoogle Scholar

  • Benjamini, Y. and Y. Hochberg (1995): “Controlling the false discovery rate: A practical and powerful approach to multiple testing,” J. R. Stat. Soc. Series B Stat. Methodol., 57, 289–300.Google Scholar

  • Bühlmann, P. and S. van de Geer (2011): Statistics for high-dimensional data: methods, theory and applications, Berlin Heidelberg: Springer-Verlag.Web of ScienceGoogle Scholar

  • Cho, S., K. Kim, Y. Kim, J. Lee, Y. Cho, J. Lee, B. Han, H. Kim, J. Ott and T. Park (2010): “Joint identification of multiple genetic variants via elastic-net variable selection in a genome-wide association analysis,” Ann. Hum. Genet., 74, 416–428.CrossrefWeb of ScienceGoogle Scholar

  • Efron, B. (2008): “Microarrays, empirical Bayes and the two groups model,” Stat. Sci., 23, 1–22.CrossrefWeb of ScienceGoogle Scholar

  • Efron, B. (2013): Large-scale inference: empirical bayes methods for estimation, testing, and prediction, Cambridge, UK: Cambridge University Press.Google Scholar

  • Efron, B. (2014): “Estimation and accuracy after model selection,” J. Am. Stat. Assoc., 109, 991–1007.Web of SciencePubMedCrossrefGoogle Scholar

  • Fan, J. and J. Lv (2008): “Sure independence screening for ultrahigh dimensional feature space,” J. R. Stat. Soc. Series B Stat. Methodol., 70, 849–911.CrossrefWeb of ScienceGoogle Scholar

  • Genovese, C. and L. Wasserman (2004): “A stochastic process approach to false discovery control,” Ann. Stat., 32, 1035–1061.CrossrefGoogle Scholar

  • Gui, J. and H. Li (2005): “Penalized cox regression analysis in the high-dimensional and low-sample size settings with application to microarray gene expression data,” Bioinformatics, 21, 3001–3008.PubMedCrossrefGoogle Scholar

  • Hastie, T., R. Tibshirani and J. Friedman (2009): The elements of statistical learning: data mining, inference, and prediction, New York: Springer.Google Scholar

  • He, K., Y. Li, J. Zhu, H. Liu, J. Lee, C. Amos, T. Hyslop, J. Jin, H. Lin, Q. Wei and Y. Li (2016): “Component-wise gradient boosting and false discovery control in survival analysis with high-dimensional covariates,” Bioinformatics, 32, 50–57.PubMedWeb of ScienceGoogle Scholar

  • Meinshausen, N., L. Meier and P. Bühlmann (2009): “P-values for highdimensional regression,” J. Am. Stat. Assoc., 104, 1671–1681.CrossrefGoogle Scholar

  • Meinshausen, N., L. Meier and P. Bühlmann (2010): “Stability selection (with discussion),” J. R. Stat. Soc. Series B Stat. Methodol., 72, 417–473.Google Scholar

  • Scott, L., M. Erdos, J. Huyghe, R. Welch, A. Beck, M. Boehnke, F. Collins and S. Parker (2016): “The genetic regulatory sigature of type 2 diabetes in human skeletal muscle,” Nat. Commun., 7, 1–12.Google Scholar

  • Shaughnessy, J., F. Zhan, B. Burington, Y. Huang, S. Colla, I. Hanamura, J. Stewart, B. Kordsmeier, C. Randolph, D. Williams, Y. Xiao, H. Xu, J. Epstein, E. Anaissie, S. Krishna, M. Cottler-Fox, K. Hollmig, A. Mohiuddin, M. Pineda-Roman, G. Tricot, F. van Rhee, J. Sawyer, Y. Alsayed, R. Walker, M. Zangari, J. Crowley and B. Barlogie (2007): “A validated gene expression model of high-risk multiple myeloma is defined by deregulated expression of genes mapping to chromosome 1,” Blood, 109, 2276–2284.CrossrefWeb of SciencePubMedGoogle Scholar

  • Shi, L., G. Campbell, W. Jones and M. Consortium (2010): “The MAQC-II project: a comprehensive study of common practices for the development and validation of microarray-based predictive models,” Nat. Biotechnol., 28, 827–838.CrossrefGoogle Scholar

  • Simon, N., J. Friedman, T. Hastie and R. Tibshirani (2011): “Regularization paths for Cox’s proportional hazards model via coordinate descent,” J. Stat. Softw., 39, 1–13.PubMedGoogle Scholar

  • Sun, S., M. Hood, L. Scott, Q. Peng, S. Mukherjee, J. Tung and X. Zhou (2017): “Differential expression analysis for RNAseq using Poisson mixed models,” Nucleic Acids Res., 45, e106.PubMedWeb of ScienceCrossrefGoogle Scholar

  • Tibshirani, R. (1996): “Regression shrinkage and selection via the lasso,” J. R. Stat. Soc. Series B Stat. Methodol., 58, 267–288.Google Scholar

  • Tusher, V., R. Tibshirani and G. Chu (2001): “Significane analysis of microarrays applied to the ionizing radiation repsonse,” Proc. Natl. Acad. Sci. USA, 98, 5116–5121.CrossrefGoogle Scholar

  • Uno, H., T. Cai, L. Tian and L. J. Wei (2007): “Evaluating prediction rules for t-year survivors with censored regression models,” J. Am. Stat. Assoc., 102, 527–537.CrossrefWeb of ScienceGoogle Scholar

  • Wu, T., Y. Chen, T. Hastie, E. Sobel and K. Lange (2009): “Genome-wide association analysis by lasso penalized logistic regression,” Bioinformatics, 25, 714–721.Web of SciencePubMedCrossrefGoogle Scholar

  • Zou, H. and T. Hastie (2005): “Regression shrinkage and selection via the elastic net with application to microarrays,” J. R. Stat. Soc. Series B Stat. Methodol., 67, 301–320.Google Scholar

About the article

Published Online: 2018-12-15


Funding Source: Chinese Natural Science Foundation

Award identifier / Grant number: 11528102

The authors thank Dr. Kirsten Herold at the UM-SPH Writing lab for her helpful suggestions. Chinese Natural Science Foundation, Grant Number: 11528102.


Citation Information: Statistical Applications in Genetics and Molecular Biology, Volume 17, Issue 6, 20180038, ISSN (Online) 1544-6115, DOI: https://doi.org/10.1515/sagmb-2018-0038.

Export Citation

©2018 Walter de Gruyter GmbH, Berlin/Boston.Get Permission

Supplementary Article Materials

Citing Articles

Here you can find all Crossref-listed publications in which this article is cited. If you would like to receive automatic email messages as soon as this article is cited in other publications, simply activate the “Citation Alert” on the top of this page.

[1]
Arlina Shen, Han Fu, Kevin He, and Hui Jiang
Cancers, 2019, Volume 11, Number 6, Page 744

Comments (0)

Please log in or register to comment.
Log in