This paper addresses a methodological technique of leave-many-out cross-validation for choosing cutoff values in stepwise regression methods for simplifying the final regression model. A practical approach to choose cutoff values through cross-validation is to compute the minimum Predicted Residual Sum of Squares (PRESS). A leave-one-out cross-validation may overestimate the predictive model capabilities, for example see Shao (1993) and So et al (2000). Shao proves with asymptotic results and simulation that the model with the minimum value for the leave-one-out cross validation estimate of predictor errors is often over specified. That is, too many insignificant variables are contained in set ?i of the regression model. He recommended using a method that leaves out a subset of observations, called K-fold cross-validation. Leave-many-out procedures can be more adequate in order to obtain significant and optimal results. We describe various investigations for the assessment of performance of predictive regression models, including different values of K in K-fold cross-validation and selecting the best possible cutoff-values for automated model selection methods. We propose a resampling procedure by introducing alternative estimates of boosted cross-validated PRESS values for deciding the number of observations (l) to be omitted and number of folds/subsets (K) subsequently in K-fold cross-validation. Salahuddin and Hawkes (1991) used leave-one-out cross-validation to select equal cutoff values in stepwise regression which minimizes PRESS. We concentrate on applying K-fold cross-validation to choose unequal cutoff values that is F-to-enter and F-to-remove values which are then used for determining predictor variables in a regression model from the full data set. Our computer program for K-fold cross-validation can be efficiently used for choosing both equal and unequal cutoff values for automated model selection methods. Some previously analyzed data and Monte Carlo simulation are used to evaluate the proposed method against alternatives through a design experiment approach.

Ed. by Hubbard, Alan E. / van der Laan, Mark J.
1 Issue per year
IMPACT FACTOR 2011: 1.284
Issues
Volume 7 (2011)
Volume 6 (2010)
Volume 5 (2009)
Volume 4 (2008)
Volume 3 (2007)
Volume 2 (2006)
Volume 1 (2005)
Most Downloaded Articles
- An Introduction to Causal Inference by Pearl, Judea
- Meta-Analysis of Observational Studies with Unmeasured Confounders by McCandless, Lawrence C.
- Accuracy of Conventional and Marginal Structural Cox Model Estimators: A Simulation Study by Xiao, Yongling/ Abrahamowicz, Michal and Moodie, Erica E. M.
- Evaluating treatment effectiveness in patient subgroups: a comparison of propensity score methods with an automated matching approach by Radice, Rosalba/ Ramsahai, Roland/ Grieve, Richard/ Kreif, Noemi/ Sadique, Zia and Sekhon, Jasjeet S.
- Special Issue on Causal Inference in Health Research by Moodie, Erica E. M./ Kaufman, Jay S. and Platt, Robert W.
On the Use of K-Fold Cross-Validation to Choose Cutoff Values and Assess the Performance of Predictive Models in Stepwise Regression
Zafar Mahmood / Salahuddin Khan
1NWFP Agricultural University, Peshawar
1University of Peshawar
Citation Information: The International Journal of Biostatistics. Volume 5, Issue 1, Pages –, ISSN (Online) 1557-4679, DOI: 10.2202/1557-4679.1105, July 2009
Publication History:
- Published Online:
- 2009-07-27
Keywords: cross-validation; cutoff values; stepwise regression; prediction; variable selection


















Comments (0)