Jump to ContentJump to Main Navigation
Show Summary Details
More options …

Statistical Applications in Genetics and Molecular Biology

Editor-in-Chief: Sanguinetti, Guido


IMPACT FACTOR 2018: 0.536
5-year IMPACT FACTOR: 0.764

CiteScore 2018: 0.49

SCImago Journal Rank (SJR) 2018: 0.316
Source Normalized Impact per Paper (SNIP) 2018: 0.342

Mathematical Citation Quotient (MCQ) 2018: 0.02

Online
ISSN
1544-6115
See all formats and pricing
More options …
Volume 3, Issue 1

Issues

Volume 10 (2011)

Volume 9 (2010)

Volume 6 (2007)

Volume 5 (2006)

Volume 4 (2005)

Volume 2 (2003)

Volume 1 (2002)

Asymptotic Optimality of Likelihood-Based Cross-Validation

Mark J. van der Laan / Sandrine Dudoit / Sunduz Keles
Published Online: 2004-03-22 | DOI: https://doi.org/10.2202/1544-6115.1036

Likelihood-based cross-validation is a statistical tool for selecting a density estimate based on n i.i.d. observations from the true density among a collection of candidate density estimators. General examples are the selection of a model indexing a maximum likelihood estimator, and the selection of a bandwidth indexing a nonparametric (e.g. kernel) density estimator. In this article, we establish a finite sample result for a general class of likelihood-based cross-validation procedures (as indexed by the type of sample splitting used, e.g. V-fold cross-validation). This result implies that the cross-validation selector performs asymptotically as well (w.r.t. to the Kullback-Leibler distance to the true density) as a benchmark model selector which is optimal for each given dataset and depends on the true density. Crucial conditions of our theorem are that the size of the validation sample converges to infinity, which excludes leave-one-out cross-validation, and that the candidate density estimates are bounded away from zero and infinity. We illustrate these asymptotic results and the practical performance of likelihood-based cross-validation for the purpose of bandwidth selection with a simulation study. Moreover, we use likelihood-based cross-validation in the context of regulatory motif detection in DNA sequences.

Keywords: Likelihood cross-validation; maximum likelihood estimation; Kullback-Leibler divergence; density estimation; bandwidth selection; model selection; variable selection.

About the article

Published Online: 2004-03-22


Citation Information: Statistical Applications in Genetics and Molecular Biology, Volume 3, Issue 1, Pages 1–23, ISSN (Online) 1544-6115, DOI: https://doi.org/10.2202/1544-6115.1036.

Export Citation

©2011 Walter de Gruyter GmbH & Co. KG, Berlin/Boston.Get Permission

Citing Articles

Here you can find all Crossref-listed publications in which this article is cited. If you would like to receive automatic email messages as soon as this article is cited in other publications, simply activate the “Citation Alert” on the top of this page.

[1]
Daniel Commenges, Cécile Proust-Lima, Cécilia Samieri, and Benoit Liquet
The International Journal of Biostatistics, 2015, Volume 11, Number 1
[2]
Mark J. van der Laan
The International Journal of Biostatistics, 2014, Volume 10, Number 1
[3]
Bruce A. Desmarais
Public Choice, 2012, Volume 151, Number 3-4, Page 719
[4]
Arafat Tayeb, Aurélie Labbe, Alexandre Bureau, and Chantal Mérette
Computational Statistics, 2011, Volume 26, Number 3, Page 539
[5]
Skyler J. Cranmer and Bruce A. Desmarais
Political Analysis, 2017, Volume 25, Number 2, Page 145
[6]
Xibin Zhang, Maxwell L. King, and Rob J. Hyndman
Computational Statistics & Data Analysis, 2006, Volume 50, Number 11, Page 3009
[7]
Ezequiel López-Rubio and Juan Miguel Ortiz-de-Lazcano-Lobato
Neural Processing Letters, 2009, Volume 30, Number 2, Page 113
[9]
Annette M. Molinaro, Sandrine Dudoit, and Mark J. van der Laan
Journal of Multivariate Analysis, 2004, Volume 90, Number 1, Page 154
[10]
Noémi Kreif, Linh Tran, Richard Grieve, Bianca De Stavola, Robert C Tasker, and Maya Petersen
American Journal of Epidemiology, 2017, Volume 186, Number 12, Page 1370
[11]
Alexandre Bureau, Aurélie Labbe, Jordie Croteau,, and Chantal Mérette
Genetic Epidemiology, 2008, Volume 32, Number 5, Page 476
[12]
Sündüz Keleş, Christopher L. Warren, Clayton D. Carlson, and Aseem Z. Ansari
Nucleic Acids Research, 2008, Volume 36, Number 10, Page 3171
[13]
Jason G. Su, Philip K. Hopke, Yilin Tian, Nichole Baldwin, Sally W. Thurston, Kristin Evans, and David Q. Rich
Atmospheric Environment, 2015, Volume 122, Page 477
[14]
Romain Neugebauer, Bruce Fireman, Jason A. Roy, Marsha A. Raebel, Gregory A. Nichols, and Patrick J. O'Connor
Journal of Clinical Epidemiology, 2013, Volume 66, Number 8, Page S99
[15]
Iván Díaz, Alan Hubbard, Anna Decker, Mitchell Cohen, and Kewei Chen
PLOS ONE, 2015, Volume 10, Number 3, Page e0120031
[16]
Bernardo S. Beckerman, Michael Jerrett, Randall V. Martin, Aaron van Donkelaar, Zev Ross, and Richard T. Burnett
Atmospheric Environment, 2013, Volume 77, Page 172
[17]
[18]
Sandrine Dudoit and Mark J. van der Laan
Statistical Methodology, 2005, Volume 2, Number 2, Page 131
[19]
Iván Díaz Muñoz and Mark van der Laan
Biometrics, 2012, Volume 68, Number 2, Page 541
[20]
Leming Qu and Wotao Yin
Computational Statistics & Data Analysis, 2012, Volume 56, Number 2, Page 384
[21]
Maurizio Filippone and Guido Sanguinetti
Computational Statistics & Data Analysis, 2011, Volume 55, Number 12, Page 3104

Comments (0)

Please log in or register to comment.
Log in