Jump to ContentJump to Main Navigation
Show Summary Details
More options …

Statistical Applications in Genetics and Molecular Biology

Editor-in-Chief: Sanguinetti, Guido

IMPACT FACTOR 2018: 0.536
5-year IMPACT FACTOR: 0.764

CiteScore 2018: 0.49

SCImago Journal Rank (SJR) 2018: 0.316
Source Normalized Impact per Paper (SNIP) 2018: 0.342

Mathematical Citation Quotient (MCQ) 2018: 0.02

See all formats and pricing
More options …
Volume 5, Issue 1


Volume 10 (2011)

Volume 9 (2010)

Volume 6 (2007)

Volume 5 (2006)

Volume 4 (2005)

Volume 2 (2003)

Volume 1 (2002)

Model Selection for Mixtures of Mutagenetic Trees

Junming Yin / Niko Beerenwinkel / Jörg Rahnenführer / Thomas Lengauer
Published Online: 2006-06-23 | DOI: https://doi.org/10.2202/1544-6115.1164

The evolution of drug resistance in HIV is characterized by the accumulation of resistance-associated mutations in the HIV genome. Mutagenetic trees, a family of restricted Bayesian tree models, have been applied to infer the order and rate of occurrence of these mutations. Understanding and predicting this evolutionary process is an important prerequisite for the rational design of antiretroviral therapies. In practice, mixtures models of K mutagenetic trees provide more flexibility and are often more appropriate for modelling observed mutational patterns.Here, we investigate the model selection problem for K-mutagenetic trees mixture models. We evaluate several classical model selection criteria including cross-validation, the Bayesian Information Criterion (BIC), and the Akaike Information Criterion. We also use the empirical Bayes method by constructing a prior probability distribution for the parameters of a mutagenetic trees mixture model and deriving the posterior probability of the model. In addition to the model dimension, we consider the redundancy of a mixture model, which is measured by comparing the topologies of trees within a mixture model. Based on the redundancy, we propose a new model selection criterion, which is a modification of the BIC.Experimental results on simulated and on real HIV data show that the classical criteria tend to select models with far too many tree components. Only cross-validation and the modified BIC recover the correct number of trees and the tree topologies most of the time. At the same optimal performance, the runtime of the new BIC modification is about one order of magnitude lower. Thus, this model selection criterion can also be used for large data sets for which cross-validation becomes computationally infeasible.

Keywords: model selection; mixtures of mutagenetic trees; BIC; empirical bayes

About the article

Published Online: 2006-06-23

Citation Information: Statistical Applications in Genetics and Molecular Biology, Volume 5, Issue 1, ISSN (Online) 1544-6115, DOI: https://doi.org/10.2202/1544-6115.1164.

Export Citation

©2011 Walter de Gruyter GmbH & Co. KG, Berlin/Boston.Get Permission

Citing Articles

Here you can find all Crossref-listed publications in which this article is cited. If you would like to receive automatic email messages as soon as this article is cited in other publications, simply activate the “Citation Alert” on the top of this page.

Jasmina Bogojeska, Thomas Lengauer, and Jörg Rahnenführer
BMC Bioinformatics, 2008, Volume 9, Number 1
Swapnali Pathare, Alejandro A. Schäffer, Niko Beerenwinkel, and Manoj Mahimkar
International Journal of Cancer, 2009, Volume 124, Number 12, Page 2864
Katrin Hainke, Jörg Rahnenführer, and Roland Fried
Biometrical Journal, 2012, Volume 54, Number 5, Page 617

Comments (0)

Please log in or register to comment.
Log in