Jump to ContentJump to Main Navigation
Show Summary Details
More options …

Statistical Applications in Genetics and Molecular Biology

Editor-in-Chief: Sanguinetti, Guido


IMPACT FACTOR 2018: 0.536
5-year IMPACT FACTOR: 0.764

CiteScore 2018: 0.49

SCImago Journal Rank (SJR) 2018: 0.316
Source Normalized Impact per Paper (SNIP) 2018: 0.342

Mathematical Citation Quotient (MCQ) 2018: 0.02

Online
ISSN
1544-6115
See all formats and pricing
More options …
Volume 18, Issue 5

Issues

Volume 10 (2011)

Volume 9 (2010)

Volume 6 (2007)

Volume 5 (2006)

Volume 4 (2005)

Volume 2 (2003)

Volume 1 (2002)

Stability selection for lasso, ridge and elastic net implemented with AFT models

Md Hasinur Rahaman Khan / Anamika Bhadra / Tamanna Howlader
Published Online: 2019-10-07 | DOI: https://doi.org/10.1515/sagmb-2017-0001

Abstract

The instability in the selection of models is a major concern with data sets containing a large number of covariates. We focus on stability selection which is used as a technique to improve variable selection performance for a range of selection methods, based on aggregating the results of applying a selection procedure to sub-samples of the data where the observations are subject to right censoring. The accelerated failure time (AFT) models have proved useful in many contexts including the heavy censoring (as for example in cancer survival) and the high dimensionality (as for example in micro-array data). We implement the stability selection approach using three variable selection techniques—Lasso, ridge regression, and elastic net applied to censored data using AFT models. We compare the performances of these regularized techniques with and without stability selection approaches with simulation studies and two real data examples–a breast cancer data and a diffuse large B-cell lymphoma data. The results suggest that stability selection gives always stable scenario about the selection of variables and that as the dimension of data increases the performance of methods with stability selection also improves compared to methods without stability selection irrespective of the collinearity between the covariates.

Keywords: AFT model; Elastic net; Lasso; Ridge; Stability selection

References

  • Ambroise, C. and G. J. McLachlan (2002): “Selection bias in gene extraction on the basis of microarray gene-expression data,” PNAS, 99, 6562–6566.CrossrefPubMedGoogle Scholar

  • Candes, E. and T. Tao (2007): “The dantzig selector: Statistical estimation when p is much larger than n,” Ann. Stat., 35, 2313–2351.CrossrefWeb of ScienceGoogle Scholar

  • Efron, B., T. Hastie, I. Johnstone and R. Tibshirani (2004): “Least angle regression,” Ann. Stat., 32, 407–499.CrossrefGoogle Scholar

  • Fan, J. and R. Li (2002): “Variable selection for Cox’s proportional hazards model and frailty model,” Ann. Stat., 30, 74–99.Google Scholar

  • Faraggi, D. and R. Simon (1998): “Bayesian variable selection method for censored survival data,” Biometrics, 54, 1475–85.CrossrefPubMedGoogle Scholar

  • Gatter, K. and F. Pezzella (2010): “Diffuse large B-cell lymphoma,” Diagn. Histopathol., 16, 69–81.CrossrefGoogle Scholar

  • G’Sell, M. G., T. Hastie and R. Tibshirani (2013): “False variable selection rates in regression,” arXiv, arXiv:1302.2303.Google Scholar

  • Gui, J. and H. Li (2005a): “Penalized Cox regression analysis in the highdimensional and low-sample size settings, with applications to microarray gene expression data,” Bioinformatics, 21, 3001–3008.CrossrefGoogle Scholar

  • Gui, J. and H. Li (2005b): “Threshold gradient descent method for censored data regression, with applications in pharmacogenomics,” Pac. Symp. Biocomput., 10, 272–283.Google Scholar

  • Hoerl, A. E. and R. W. Kennard (1970): “Ridge regression: applications to nonorthogonal problems,” Technometrics, 12, 69–82.CrossrefGoogle Scholar

  • Huang, J. and S. Ma (2010a): “Variable selection in the accelerated failure time model via the bridge method,” Lifetime Data Anal., 16, 176–195.Web of ScienceCrossrefGoogle Scholar

  • Huang, J. and S. Ma (2010b): “Variable selection in the accelerated failure time model via the bridge method,” Lifetime Data Anal., 16, 176–195.Web of ScienceCrossrefGoogle Scholar

  • Huang, J., S. Ma and H. Xie (2006): “Regularized estimation in the accelerated failure time model with high-dimensional covariates,” Biometrics, 62, 813–820.CrossrefPubMedGoogle Scholar

  • Ibrahim, J. G., M.-H. Chen and S. N. Maceachern (1999): “Bayesian variable selection for proportional hazards models,” Can. J. Stat., 27, 701–717.CrossrefGoogle Scholar

  • Ioannidis, J. P. A. (2005): “Selection bias in gene extraction on the basis of microarray gene-expression data,” PLoS Med., 2, e124.Google Scholar

  • James, G. M. and P. Radchenko (2009): “A generalized dantzig selector with shrinkage tuning,” Biometrika, 96, 323–337.Web of ScienceCrossrefGoogle Scholar

  • Kalbfleisch, J. D. and R. L. Prentice (2011): The statistical analysis of failure time data. John Wiley & Sons, New York, USA.Google Scholar

  • Khan, M. H. R. (2013): “Variable selection and estimation procedures for high-dimensional survival data,” Ph.D. Thesis, Department of Statistics, University of Warwick, UK.Google Scholar

  • Khan, M. H. R. (2018): “On the performance of adaptive pre-processing technique in analysing high-dimensional censored data,” Biom. J., 60, 687–702.CrossrefGoogle Scholar

  • Khan, M. H. R. and J. E. H. Shaw (2016): “Variable selection for survival data with a class of adaptive elastic net techniques,” Stat. Comput., 26, 725–741.CrossrefWeb of ScienceGoogle Scholar

  • Khan, M. H. R. and J. E. H. Shaw (2019): “Variable selection for accelerated lifetime models with synthesized estimation techniques,” Stat. Methods Med. Res., 28, 937–952.PubMedWeb of ScienceCrossrefGoogle Scholar

  • Leng, C., Y. Lin and G. Wahba (2006): “A note on the LASSO and related procedures in model selection,” Stat. Sin., 16, 1273–1284.Google Scholar

  • Li, H. and Y. Luan (2003): “Kernel Cox regression models for linking gene expression profiles to censored survival data,” Pac. Symp. Biocomput., 8, 65–76.Google Scholar

  • Meinshausen, N. and P. Bühlmann (2010): “Stability selection,” J. R. Stat. Soc. B, 72, 417–473.CrossrefGoogle Scholar

  • Sauerbrei, W. and M. Schumacher (1992): “A bootstrap resampling procedure for model building: Application to the cox regression model,” Stat. Med., 11, 2093–2109.PubMedCrossrefGoogle Scholar

  • Stute, W. (1993): “Consistent estimation under random censorship when covariables are present,” J. Multivariate Anal., 45, 89–103.CrossrefGoogle Scholar

  • Swindell, W. (2009): “Accelerated failure time models provide a useful statistical framework for aging research,” Exp. Gerontol., 44, 190–200.CrossrefPubMedWeb of ScienceGoogle Scholar

  • Ternes, N., F. Rotolo and S. Michielsa (2016): “Empirical extensions of the LASSO penalty to reduce the false discovery rate in high dimensional cox regression models,” Stat. Med., 35, 2561–2573.PubMedCrossrefWeb of ScienceGoogle Scholar

  • Tibshirani, R. (1996): “Regression shrinkage and selection via the lasso,” J. R. Stat. Soc. B, 58, 267–288.Google Scholar

  • Tibshirani, R. (1997): “The lasso method for variable selection in the cox model,” Stat. Med., 16, 385–395.CrossrefPubMedGoogle Scholar

  • Van De Vijver, M. J., Y. D. He, L. J. van’t Veer, H. Dai, A. A. Hart, D. W. Voskuil, G. J. Schreiber, J. L. Peterse, C. Roberts, M. J. Marton, M. Parrish, D. Atsma, A. Witteveen, A. Glas, L. Delahaye, T. van der Velde, H. Bartelink, S. Rodenhuis, E. T. Rutgers, S. H. Friend and R. Bernards (2002): “A gene-expression signature as a predictor of survival in breast cancer,” N. Engl. J. Med., 347, 1999–2009.PubMedCrossrefGoogle Scholar

  • van’t Veer, L. J., H. Dai, M. J. Van De Vijver, Y. D. He, A. A. Hart, M. Mao, H. L. Peterse, K. van der Kooy, M. J. Marton, A. T. Witteveen, G. J. Schreiber, R. M. Kerkhoven, C. Roberts, P. S. Linsley, R. Bernards and S. H. Friend (2002): “Gene expression profiling predicts clinical outcome of breast cancer,” Nature, 415, 530–536.CrossrefPubMedGoogle Scholar

  • Walschaerts, M., E. Leconte and P. Besse (2012): “Stable variable selection for right censored data: comparison of methods,” arXiv preprint arXiv:1203.4928.Google Scholar

  • Wang, S., B. Nan, J. Zhu and D. Beer (2008): “Doubly penalized buckley-james method for survival data with high-dimensional covariates,” Biometrics, 64, 132–140.CrossrefPubMedWeb of ScienceGoogle Scholar

  • Wei, L. (1992): “The accelerated failure time model: a useful alternative to the cox regression model in survival analysis,” Stat. Med., 11, 1871–1879.CrossrefGoogle Scholar

  • Wright, G., W. Chan, J. Connors, E. Campo, R. Fisher, R. Gascoyne, H. Muller-Hermelink, E. Smeland, J. Giltnane, E. Hurt, H. Zhao, L. Averett, L. Yang, W. Wilson, E. Jaffe, R. Simon, R. Klausner, J. Powell, P. Duffey, D. Longo, T. Greiner, D. Weisenburger, W. Sanger, B. Dave, J. Lynch, J. Vose, J. Armitage, E. Montserrat, A. Lopez-Guillermo, T. Grogan, T. Miller, M. LeBlanc, G. Ott, S. Kvaloy, J. Delabie, H. Holte, P. Krajci, T. Stokke and L. Staudt (2002): “The use of molecular profiling to predict survival after chemotherapy for diffuse large-B-cell lymphoma,” N. Engl. J. Med., 346, 1937–1947.CrossrefPubMedGoogle Scholar

  • Zhang, H. H. and W. Lu (2007): “Adaptive lasso for Cox’s proportional hazards model,” Biometrika, 94, 691–703.CrossrefWeb of ScienceGoogle Scholar

  • Zou, H. and T. Hastie (2005): “Regularization and variable selection via the elastic net,” J. R. Stat. Soc. B, 67, 301–320.CrossrefGoogle Scholar

About the article

Published Online: 2019-10-07


Conflict of interest statement: The authors have declared no conflict of interest.


Citation Information: Statistical Applications in Genetics and Molecular Biology, Volume 18, Issue 5, 20170001, ISSN (Online) 1544-6115, DOI: https://doi.org/10.1515/sagmb-2017-0001.

Export Citation

© 2019 Walter de Gruyter GmbH, Berlin/Boston.Get Permission

Comments (0)

Please log in or register to comment.
Log in