Jump to ContentJump to Main Navigation
Show Summary Details
In This Section

Statistical Applications in Genetics and Molecular Biology

Editor-in-Chief: Stumpf, Michael P.H.

6 Issues per year


IMPACT FACTOR 2016: 0.646
5-year IMPACT FACTOR: 1.191

CiteScore 2016: 0.94

SCImago Journal Rank (SJR) 2015: 0.954
Source Normalized Impact per Paper (SNIP) 2015: 0.554

Mathematical Citation Quotient (MCQ) 2015: 0.06

Online
ISSN
1544-6115
See all formats and pricing
In This Section
Volume 6, Issue 1 (Sep 2007)

Issues

Super Learner

Mark J. van der Laan
  • University of California, Berkeley
/ Eric C Polley
  • University of California, Berkeley
/ Alan E. Hubbard
  • University of California, Berkeley
Published Online: 2007-09-16 | DOI: https://doi.org/10.2202/1544-6115.1309

When trying to learn a model for the prediction of an outcome given a set of covariates, a statistician has many estimation procedures in their toolbox. A few examples of these candidate learners are: least squares, least angle regression, random forests, and spline regression. Previous articles (van der Laan and Dudoit (2003); van der Laan et al. (2006); Sinisi et al. (2007)) theoretically validated the use of cross validation to select an optimal learner among many candidate learners. Motivated by this use of cross validation, we propose a new prediction method for creating a weighted combination of many candidate learners to build the super learner. This article proposes a fast algorithm for constructing a super learner in prediction which uses V-fold cross-validation to select weights to combine an initial set of candidate learners. In addition, this paper contains a practical demonstration of the adaptivity of this so called super learner to various true data generating distributions. This approach for construction of a super learner generalizes to any parameter which can be defined as a minimizer of a loss function.

Keywords: cross-validation; loss-based estimation; machine learning; prediction

About the article

Published Online: 2007-09-16



Citation Information: Statistical Applications in Genetics and Molecular Biology, ISSN (Online) 1544-6115, ISSN (Print) 2194-6302, DOI: https://doi.org/10.2202/1544-6115.1309. Export Citation

Citing Articles

Here you can find all Crossref-listed publications in which this article is cited. If you would like to receive automatic email messages as soon as this article is cited in other publications, simply activate the “Citation Alert” on the top of this page.

[1]
Joan Casey, Peter James, Kara Rudolph, Chih-Da Wu, and Brian Schwartz
International Journal of Environmental Research and Public Health, 2016, Volume 13, Number 3, Page 311
[2]
Alexia Kakourou, Werner Vach, and Bart Mertens
Journal of Computational Biology, 2014, Volume 21, Number 12, Page 898
[3]
Mark J. van der Laan and Richard J. C. M. Starmans
Advances in Statistics, 2014, Volume 2014, Page 1
[4]
S. Rose and M. van der Laan
American Journal of Epidemiology, 2014, Volume 179, Number 6, Page 672
[5]
Jeff Goldsmith and Fabian Scheipl
Computational Statistics & Data Analysis, 2014, Volume 70, Page 362
[6]
M. M. Glymour, T. L. Osypuk, and D. H. Rehkopf
American Journal of Epidemiology, 2013, Volume 178, Number 6, Page 858
[7]
Susan Gruber and Mark J. van der Laan
Biometrics, 2013, Volume 69, Number 1, Page 254
[8]
Paul Chaffee and Mark van der Laan
Journal of the American Statistical Association, 2012, Volume 107, Number 498, Page 513
[9]
Russell T. Shinohara, Constantine E. Frangakis, and Constantine G. Lyketsos
Biometrics, 2012, Volume 68, Number 1, Page 85
[10]
Iván Díaz Muñoz and Mark van der Laan
Biometrics, 2012, Volume 68, Number 2, Page 541
[11]
T.G. Doeswijk, A.K. Smilde, J.A. Hageman, J.A. Westerhuis, and F.A. van Eeuwijk
Analytica Chimica Acta, 2011, Volume 705, Number 1-2, Page 41
[12]
Hui Wang, Sherri Rose, and Mark J. van der Laan
Statistics & Probability Letters, 2011, Volume 81, Number 7, Page 792

Comments (0)

Please log in or register to comment.
Log in