Jump to ContentJump to Main Navigation
Show Summary Details
More options …

The International Journal of Biostatistics

Ed. by Chambaz, Antoine / Hubbard, Alan E. / van der Laan, Mark J.


IMPACT FACTOR 2018: 1.309

CiteScore 2018: 1.11

SCImago Journal Rank (SJR) 2018: 1.325
Source Normalized Impact per Paper (SNIP) 2018: 0.715

Mathematical Citation Quotient (MCQ) 2018: 0.03

Online
ISSN
1557-4679
See all formats and pricing
More options …

On the Use of K-Fold Cross-Validation to Choose Cutoff Values and Assess the Performance of Predictive Models in Stepwise Regression

Zafar Mahmood / Salahuddin Khan
Published Online: 2009-07-27 | DOI: https://doi.org/10.2202/1557-4679.1105

This paper addresses a methodological technique of leave-many-out cross-validation for choosing cutoff values in stepwise regression methods for simplifying the final regression model. A practical approach to choose cutoff values through cross-validation is to compute the minimum Predicted Residual Sum of Squares (PRESS). A leave-one-out cross-validation may overestimate the predictive model capabilities, for example see Shao (1993) and So et al (2000). Shao proves with asymptotic results and simulation that the model with the minimum value for the leave-one-out cross validation estimate of predictor errors is often over specified. That is, too many insignificant variables are contained in set ?i of the regression model. He recommended using a method that leaves out a subset of observations, called K-fold cross-validation. Leave-many-out procedures can be more adequate in order to obtain significant and optimal results. We describe various investigations for the assessment of performance of predictive regression models, including different values of K in K-fold cross-validation and selecting the best possible cutoff-values for automated model selection methods. We propose a resampling procedure by introducing alternative estimates of boosted cross-validated PRESS values for deciding the number of observations (l) to be omitted and number of folds/subsets (K) subsequently in K-fold cross-validation. Salahuddin and Hawkes (1991) used leave-one-out cross-validation to select equal cutoff values in stepwise regression which minimizes PRESS. We concentrate on applying K-fold cross-validation to choose unequal cutoff values that is F-to-enter and F-to-remove values which are then used for determining predictor variables in a regression model from the full data set. Our computer program for K-fold cross-validation can be efficiently used for choosing both equal and unequal cutoff values for automated model selection methods. Some previously analyzed data and Monte Carlo simulation are used to evaluate the proposed method against alternatives through a design experiment approach.

Keywords: cross-validation; cutoff values; stepwise regression; prediction; variable selection

About the article

Published Online: 2009-07-27


Citation Information: The International Journal of Biostatistics, Volume 5, Issue 1, ISSN (Online) 1557-4679, DOI: https://doi.org/10.2202/1557-4679.1105.

Export Citation

©2011 Walter de Gruyter GmbH & Co. KG, Berlin/Boston.Get Permission

Citing Articles

Here you can find all Crossref-listed publications in which this article is cited. If you would like to receive automatic email messages as soon as this article is cited in other publications, simply activate the “Citation Alert” on the top of this page.

[1]
Jinxiong Shi, Lianbo Zeng, Shaoqun Dong, Jipeng Wang, and Yunzhao Zhang
International Journal of Coal Geology, 2019, Page 103314
[2]
Lijuan Li, Baozhang Chen, Yanhu Zhang, Youzheng Zhao, Yue Xian, Guang Xu, Huifang Zhang, and Lifeng Guo
Remote Sensing, 2018, Volume 10, Number 12, Page 2006
[3]
Yaseen A. Hamaamin, A. Pouyan Nejadhashemi, Zhen Zhang, Subhasis Giri, Umesh Adhikari, and Matthew R. Herman
Sustainable Water Resources Management, 2018
[4]
Benjamin A. Lange, Christian Katlein, Marcel Nicolaus, Ilka Peeken, and Hauke Flores
Journal of Geophysical Research: Oceans, 2016, Volume 121, Number 12, Page 8511
[5]
Young Joo Yoon, Cheolwoo Park, Erik Hofmeister, and Sangwook Kang
Journal of Applied Statistics, 2012, Volume 39, Number 7, Page 1605
[6]
Sean A. Woznicki, A. Pouyan Nejadhashemi, Dennis M. Ross, Zhen Zhang, Lizhu Wang, and Abdol-Hossein Esfahanian
Science of The Total Environment, 2015, Volume 511, Page 341
[7]
Abdallah Bashir Musa
International Journal of Machine Learning and Cybernetics, 2013, Volume 4, Number 1, Page 13
[8]
Anja Guckland, Bernd Ahrends, Uwe Paar, Inge Dammann, Jan Evers, Karl Josef Meiwes, Egbert Schönfelder, Thomas Ullrich, Michael Mindrup, Nils König, and Johannes Eichhorn
European Journal of Forest Research, 2012, Volume 131, Number 6, Page 1869

Comments (0)

Please log in or register to comment.
Log in