Jump to ContentJump to Main Navigation
Show Summary Details
More options …

Statistical Applications in Genetics and Molecular Biology

Editor-in-Chief: Sanguinetti, Guido

IMPACT FACTOR 2018: 0.536
5-year IMPACT FACTOR: 0.764

CiteScore 2018: 0.49

SCImago Journal Rank (SJR) 2018: 0.316
Source Normalized Impact per Paper (SNIP) 2018: 0.342

Mathematical Citation Quotient (MCQ) 2018: 0.02

See all formats and pricing
More options …
Volume 3, Issue 1


Volume 10 (2011)

Volume 9 (2010)

Volume 6 (2007)

Volume 5 (2006)

Volume 4 (2005)

Volume 2 (2003)

Volume 1 (2002)

Asymptotic Optimality of Likelihood-Based Cross-Validation

Mark J. van der Laan / Sandrine Dudoit / Sunduz Keles
Published Online: 2004-03-22 | DOI: https://doi.org/10.2202/1544-6115.1036

Likelihood-based cross-validation is a statistical tool for selecting a density estimate based on n i.i.d. observations from the true density among a collection of candidate density estimators. General examples are the selection of a model indexing a maximum likelihood estimator, and the selection of a bandwidth indexing a nonparametric (e.g. kernel) density estimator. In this article, we establish a finite sample result for a general class of likelihood-based cross-validation procedures (as indexed by the type of sample splitting used, e.g. V-fold cross-validation). This result implies that the cross-validation selector performs asymptotically as well (w.r.t. to the Kullback-Leibler distance to the true density) as a benchmark model selector which is optimal for each given dataset and depends on the true density. Crucial conditions of our theorem are that the size of the validation sample converges to infinity, which excludes leave-one-out cross-validation, and that the candidate density estimates are bounded away from zero and infinity. We illustrate these asymptotic results and the practical performance of likelihood-based cross-validation for the purpose of bandwidth selection with a simulation study. Moreover, we use likelihood-based cross-validation in the context of regulatory motif detection in DNA sequences.

Keywords: Likelihood cross-validation; maximum likelihood estimation; Kullback-Leibler divergence; density estimation; bandwidth selection; model selection; variable selection.

About the article

Published Online: 2004-03-22

Citation Information: Statistical Applications in Genetics and Molecular Biology, Volume 3, Issue 1, Pages 1–23, ISSN (Online) 1544-6115, DOI: https://doi.org/10.2202/1544-6115.1036.

Export Citation

©2011 Walter de Gruyter GmbH & Co. KG, Berlin/Boston.Get Permission

Citing Articles

Here you can find all Crossref-listed publications in which this article is cited. If you would like to receive automatic email messages as soon as this article is cited in other publications, simply activate the “Citation Alert” on the top of this page.

Daniel Commenges, Cécile Proust-Lima, Cécilia Samieri, and Benoit Liquet
The International Journal of Biostatistics, 2015, Volume 11, Number 1
Mark J. van der Laan
The International Journal of Biostatistics, 2014, Volume 10, Number 1
Bruce A. Desmarais
Public Choice, 2012, Volume 151, Number 3-4, Page 719
Arafat Tayeb, Aurélie Labbe, Alexandre Bureau, and Chantal Mérette
Computational Statistics, 2011, Volume 26, Number 3, Page 539
Skyler J. Cranmer and Bruce A. Desmarais
Political Analysis, 2017, Volume 25, Number 2, Page 145
Xibin Zhang, Maxwell L. King, and Rob J. Hyndman
Computational Statistics & Data Analysis, 2006, Volume 50, Number 11, Page 3009
Ezequiel López-Rubio and Juan Miguel Ortiz-de-Lazcano-Lobato
Neural Processing Letters, 2009, Volume 30, Number 2, Page 113
Annette M. Molinaro, Sandrine Dudoit, and Mark J. van der Laan
Journal of Multivariate Analysis, 2004, Volume 90, Number 1, Page 154
Noémi Kreif, Linh Tran, Richard Grieve, Bianca De Stavola, Robert C Tasker, and Maya Petersen
American Journal of Epidemiology, 2017, Volume 186, Number 12, Page 1370
Alexandre Bureau, Aurélie Labbe, Jordie Croteau,, and Chantal Mérette
Genetic Epidemiology, 2008, Volume 32, Number 5, Page 476
Sündüz Keleş, Christopher L. Warren, Clayton D. Carlson, and Aseem Z. Ansari
Nucleic Acids Research, 2008, Volume 36, Number 10, Page 3171
Jason G. Su, Philip K. Hopke, Yilin Tian, Nichole Baldwin, Sally W. Thurston, Kristin Evans, and David Q. Rich
Atmospheric Environment, 2015, Volume 122, Page 477
Romain Neugebauer, Bruce Fireman, Jason A. Roy, Marsha A. Raebel, Gregory A. Nichols, and Patrick J. O'Connor
Journal of Clinical Epidemiology, 2013, Volume 66, Number 8, Page S99
Iván Díaz, Alan Hubbard, Anna Decker, Mitchell Cohen, and Kewei Chen
PLOS ONE, 2015, Volume 10, Number 3, Page e0120031
Bernardo S. Beckerman, Michael Jerrett, Randall V. Martin, Aaron van Donkelaar, Zev Ross, and Richard T. Burnett
Atmospheric Environment, 2013, Volume 77, Page 172
Sandrine Dudoit and Mark J. van der Laan
Statistical Methodology, 2005, Volume 2, Number 2, Page 131
Iván Díaz Muñoz and Mark van der Laan
Biometrics, 2012, Volume 68, Number 2, Page 541
Leming Qu and Wotao Yin
Computational Statistics & Data Analysis, 2012, Volume 56, Number 2, Page 384
Maurizio Filippone and Guido Sanguinetti
Computational Statistics & Data Analysis, 2011, Volume 55, Number 12, Page 3104

Comments (0)

Please log in or register to comment.
Log in