Monte Carlo estimation of total variation distance of Markov chains on large spaces, with application to phylogenetics

Radu Herbei / Laura Kubatko
Published Online: 2013-03-26 | DOI: https://doi.org/10.1515/sagmb-2012-0023


Markov chains are widely used for modeling in many areas of molecular biology and genetics. As the complexity of such models advances, it becomes increasingly important to assess the rate at which a Markov chain converges to its stationary distribution in order to carry out accurate inference. A common measure of convergence to the stationary distribution is the total variation distance, but this measure can be difficult to compute when the state space of the chain is large. We propose a Monte Carlo method to estimate the total variation distance that can be applied in this situation, and we demonstrate how the method can be efficiently implemented by taking advantage of GPU computing techniques. We apply the method to two Markov chains on the space of phylogenetic trees, and discuss the implications of our findings for the development of algorithms for phylogenetic inference.

Keywords: GPU computing; mixing time; phylogenetics; total variation distance


Corresponding author: Laura Kubatko, The Ohio State University – Statistics, 404 Cockins Hall, 1958, Neil Avenue, Columbus, OH 43210, USA, Phone: +1-614-247-8846, Fax: +1-614-292-2096

Published Online: 2013-03-26

Citation Information: Statistical Applications in Genetics and Molecular Biology, Volume 12, Issue 1, Pages 39–48, ISSN (Online) 1544-6115, ISSN (Print) 2194-6302, DOI: https://doi.org/10.1515/sagmb-2012-0023.

©2013 by Walter de Gruyter Berlin Boston.

David A. Spade, Radu Herbei, and Laura S. Kubatko
Statistics & Probability Letters, 2014, Volume 84, Page 247

