Jump to ContentJump to Main Navigation
Show Summary Details

Statistical Applications in Genetics and Molecular Biology

Editor-in-Chief: Stumpf, Michael P.H.

6 Issues per year


IMPACT FACTOR increased in 2015: 1.265
5-year IMPACT FACTOR: 1.423
Rank 42 out of 123 in category Statistics & Probability in the 2015 Thomson Reuters Journal Citation Report/Science Edition

SCImago Journal Rank (SJR) 2015: 0.954
Source Normalized Impact per Paper (SNIP) 2015: 0.554
Impact per Publication (IPP) 2015: 1.061

Mathematical Citation Quotient (MCQ) 2015: 0.06

Online
ISSN
1544-6115
See all formats and pricing

 



30,00 € / $42.00 / £23.00

Get Access to Full Text

Investigating the performance of AIC in selecting phylogenetic models

Dwueng-Chwuan Jhwueng
  • Department of Statistics, Feng-Chia University, Taichung, Taiwan 40724, R.O.C.
/ Snehalata Huzurbazar
  • Statistical and Applied Mathematical Sciences Institute, Research Triangle Park, NC 27709, USA
  • Department of Statistics, University of Wyoming, Laramie, WY 82071, USA
  • Department of Statistics, North Carolina State University, Raleigh, NC 27695, USA
/ Brian C. O’Meara
  • Department of Ecology and Evolutionary Biology, University of Tennessee, Knoxville, TN 37996, USA
/ Liang Liu
  • Department of Statistics and Institute of Bioinformatics, University of Georgia, 101 Cedar Street, Athens, GA 30606 USA
  • :
Published Online: 2014-05-27 | DOI: https://doi.org/10.1515/sagmb-2013-0048

Abstract

The popular likelihood-based model selection criterion, Akaike’s Information Criterion (AIC), is a breakthrough mathematical result derived from information theory. AIC is an approximation to Kullback-Leibler (KL) divergence with the derivation relying on the assumption that the likelihood function has finite second derivatives. However, for phylogenetic estimation, given that tree space is discrete with respect to tree topology, the assumption of a continuous likelihood function with finite second derivatives is violated. In this paper, we investigate the relationship between the expected log likelihood of a candidate model, and the expected KL divergence in the context of phylogenetic tree estimation. We find that given the tree topology, AIC is an unbiased estimator of the expected KL divergence. However, when the tree topology is unknown, AIC tends to underestimate the expected KL divergence for phylogenetic models. Simulation results suggest that the degree of underestimation varies across phylogenetic models so that even for large sample sizes, the bias of AIC can result in selecting a wrong model. As the choice of phylogenetic models is essential for statistical phylogenetic inference, it is important to improve the accuracy of model selection criteria in the context of phylogenetics.

Keywords: AIC; Kullback-Leibler divergence; model selection; phylogenetics

Corresponding author: Liang Liu, Department of Statistics and Institute of Bioinformatics, University of Georgia, 101 Cedar Street, Athens, GA 30606 USA, Phone: +1-706-542-3309, Fax: +1-706-542-3391, e-mail:


Published Online: 2014-05-27

Published in Print: 2014-08-01


Citation Information: Statistical Applications in Genetics and Molecular Biology. Volume 13, Issue 4, Pages 459–475, ISSN (Online) 1544-6115, ISSN (Print) 2194-6302, DOI: https://doi.org/10.1515/sagmb-2013-0048, May 2014

Comments (0)

Please log in or register to comment.