Jump to ContentJump to Main Navigation
Show Summary Details

Statistical Applications in Genetics and Molecular Biology

Editor-in-Chief: Stumpf, Michael P.H.

IMPACT FACTOR increased in 2015: 1.265
5-year IMPACT FACTOR: 1.423
Rank 42 out of 123 in category Statistics & Probability in the 2015 Thomson Reuters Journal Citation Report/Science Edition

SCImago Journal Rank (SJR) 2015: 0.954
Source Normalized Impact per Paper (SNIP) 2015: 0.554
Impact per Publication (IPP) 2015: 1.061

Mathematical Citation Quotient (MCQ) 2015: 0.06

99,00 € / $149.00 / £75.00*

See all formats and pricing

Super Learner

Mark J. van der Laan1 / Eric C Polley2 / Alan E. Hubbard3

1University of California, Berkeley

2University of California, Berkeley

3University of California, Berkeley

Citation Information: Statistical Applications in Genetics and Molecular Biology. Volume 6, Issue 1, ISSN (Online) 1544-6115, ISSN (Print) 2194-6302, DOI: 10.2202/1544-6115.1309, September 2007

Publication History

Published Online:

When trying to learn a model for the prediction of an outcome given a set of covariates, a statistician has many estimation procedures in their toolbox. A few examples of these candidate learners are: least squares, least angle regression, random forests, and spline regression. Previous articles (van der Laan and Dudoit (2003); van der Laan et al. (2006); Sinisi et al. (2007)) theoretically validated the use of cross validation to select an optimal learner among many candidate learners. Motivated by this use of cross validation, we propose a new prediction method for creating a weighted combination of many candidate learners to build the super learner. This article proposes a fast algorithm for constructing a super learner in prediction which uses V-fold cross-validation to select weights to combine an initial set of candidate learners. In addition, this paper contains a practical demonstration of the adaptivity of this so called super learner to various true data generating distributions. This approach for construction of a super learner generalizes to any parameter which can be defined as a minimizer of a loss function.

Keywords: cross-validation; loss-based estimation; machine learning; prediction

Citing Articles

Here you can find all Crossref-listed publications in which this article is cited. If you would like to receive automatic email messages as soon as this article is cited in other publications, simply activate the “Citation Alert” on the top of this page.

Joan Casey, Peter James, Kara Rudolph, Chih-Da Wu, and Brian Schwartz
International Journal of Environmental Research and Public Health, 2016, Volume 13, Number 3, Page 311
Alexia Kakourou, Werner Vach, and Bart Mertens
Journal of Computational Biology, 2014, Volume 21, Number 12, Page 898
Mark J. van der Laan and Richard J. C. M. Starmans
Advances in Statistics, 2014, Volume 2014, Page 1
S. Rose and M. van der Laan
American Journal of Epidemiology, 2014, Volume 179, Number 6, Page 672
Jeff Goldsmith and Fabian Scheipl
Computational Statistics & Data Analysis, 2014, Volume 70, Page 362
M. M. Glymour, T. L. Osypuk, and D. H. Rehkopf
American Journal of Epidemiology, 2013, Volume 178, Number 6, Page 858
Susan Gruber and Mark J. van der Laan
Biometrics, 2013, Volume 69, Number 1, Page 254
Paul Chaffee and Mark van der Laan
Journal of the American Statistical Association, 2012, Volume 107, Number 498, Page 513
Russell T. Shinohara, Constantine E. Frangakis, and Constantine G. Lyketsos
Biometrics, 2012, Volume 68, Number 1, Page 85
Iván Díaz Muñoz and Mark van der Laan
Biometrics, 2012, Volume 68, Number 2, Page 541
T.G. Doeswijk, A.K. Smilde, J.A. Hageman, J.A. Westerhuis, and F.A. van Eeuwijk
Analytica Chimica Acta, 2011, Volume 705, Number 1-2, Page 41
Hui Wang, Sherri Rose, and Mark J. van der Laan
Statistics & Probability Letters, 2011, Volume 81, Number 7, Page 792

Comments (0)

Please log in or register to comment.