Skip to content
Licensed Unlicensed Requires Authentication Published by De Gruyter March 26, 2013

Monte Carlo estimation of total variation distance of Markov chains on large spaces, with application to phylogenetics

Radu Herbei and Laura Kubatko EMAIL logo

Abstract

Markov chains are widely used for modeling in many areas of molecular biology and genetics. As the complexity of such models advances, it becomes increasingly important to assess the rate at which a Markov chain converges to its stationary distribution in order to carry out accurate inference. A common measure of convergence to the stationary distribution is the total variation distance, but this measure can be difficult to compute when the state space of the chain is large. We propose a Monte Carlo method to estimate the total variation distance that can be applied in this situation, and we demonstrate how the method can be efficiently implemented by taking advantage of GPU computing techniques. We apply the method to two Markov chains on the space of phylogenetic trees, and discuss the implications of our findings for the development of algorithms for phylogenetic inference.


Corresponding author: Laura Kubatko, The Ohio State University – Statistics, 404 Cockins Hall, 1958, Neil Avenue, Columbus, OH 43210, USA, Phone: +1-614-247-8846, Fax: +1-614-292-2096

We acknowledge computing support from the Ohio Supercomputer Center (http://www.osc.edu/).

Conflict of interest statement

Funding: The first author is supported in part by the National Science Foundation award DMS-1209142. The second author is supported in part by the National Science Foundation award DMS-1106706.

References

Aldous, D. (2000) “Mixing time for a Markov chain on cladograms,” Comb. Probab. Comput., 9, 191–204.10.1017/S096354830000417XSearch in Google Scholar

Aldous, D. (2012) URL http://www.stat.berkeley.edu/~aldous/Research/OP/clad-mix.pdfSearch in Google Scholar

Conger, M. and D. Viswanath (2006): “Shuffling cards for blackjack, bridge and other card games,” http://arxiv.org/abs/math/0606031.Search in Google Scholar

Cron, A. and M. West (2011) “Efficient classification-based relabeling in mixture models,” The American Statistician, 65, 16–20.10.1198/tast.2011.10170Search in Google Scholar PubMed PubMed Central

Diaconis, P. W. and S. P. Holmes (1998) “Matchings and phylogenetic trees,” PNAS, 95, 14600–14602.10.1073/pnas.95.25.14600Search in Google Scholar PubMed PubMed Central

Guindon, S. and O. Gascuel (2003) “A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood,” Syst. Biol., 52, 696–704.10.1080/10635150390235520Search in Google Scholar PubMed

L’Ecuyer, P., R. Simard, E. J. Chien and D. W. Kelton (2002) “An object-oriented random-number package with many long streams and substreams,” Oper. Res., 50, 1073–1075.10.1287/opre.50.6.1073.358Search in Google Scholar

Lee, L., C. Yau, M. B. Giles, A. Doucet and C. C. Homes (2010) “On the utility of graphics cards to perform massively parallel simulation of advanced Monte Carlo methods,” J. Comput. Graph. Stat., 19, 769–789.10.1198/jcgs.2010.10039Search in Google Scholar PubMed PubMed Central

Levin, D. A., Y. Peres and E. L. Wilmer (2009) “Markov chains and mixing times,” American Mathematical Society.10.1090/mbk/058Search in Google Scholar

Li, S., D. K. Pearl and H. Doss (2000) “Phylogenetic tree construction using Markov chain Monte Carlo,” J. Am. Stat. Assoc., 95, 493–508.10.1080/01621459.2000.10474227Search in Google Scholar

Matsumoto, M. and T. Nishimura (1998) “Mersenne twister: a 623-dimensionally equidistributed uniform pseudo-random number generator,” ACM Transactions on Modeling and Computer Simulation, 8, 3–30.10.1145/272991.272995Search in Google Scholar

Mossel, E. and E. Vigoda (2005) “Phylogenetic MCMC algorithms are misleading on mixtures of trees,” Science, 309, 2207–2209.10.1126/science.1115493Search in Google Scholar PubMed

NVIDIA (2012a) “CUDA C Programming Guide Version 4.2,” http://developer.download.nvidia.com/compute/DevZone/docs/html/C/doc/CUDA C Programming Guide.pdf.Search in Google Scholar

NVIDIA (2012b) “CUDA Toolkit 4.2 CURAND Guide,” http://developer.download.nvidia.com/compute/DevZone/docs/html/CUDALibraries/doc/CURAND Library.pdf.Search in Google Scholar

Randall, D. and P. Tetali (1999) “Analyzing glauber dynamics by comparison of Markov chains,” Journal of Mathematical Physics, 41, 1598–1615.10.1063/1.533199Search in Google Scholar

Ronquist, F., M. Teslenko, P. van der Mark, D. Ayres, A. Darling, S. Hohna, B. Larget, L. Liu, M. A. Suchard and J. P. Huelsenbeck (2012) “Mrbayes 3.2: Efficient Bayesian phylogenetic inference and model choice across a large model space,” Syst. Biol., 6, 539–542.10.1093/sysbio/sys029Search in Google Scholar PubMed PubMed Central

Salter, L. and D. K. Pearl (2001) “Stochastic search strategy for estimation of maximum likelihood phylogenetic trees,” Syst. Biol., 50, 7–17.10.1080/106351501750107413Search in Google Scholar

Schweinsberg, J. (2002) “An O(n2) bound for the relaxation time of a Markov chain on cladograms,” Random Struct. Algor., 20, 59–70.10.1002/rsa.1029Search in Google Scholar

Semple, C. and M. Steel (2003) Phylogenetics, Oxford University Press.Search in Google Scholar

Spade, D., R. Herbei and L. Kubatko (2012) “A note on the relaxation time of two Markov chains on rooted phylogenetic tree spaces,” submitted (available upon request).Search in Google Scholar

Stamatakis, A. (2006) “Maximum likelihood-based phylogenetic analysis with thousands of taxa and mixed models,” Bioinformatics, 4, 2688–2690.10.1093/bioinformatics/btl446Search in Google Scholar PubMed

Suchard, M. A. and A. Rambaut (2009) “Many-core algorithms for statistical phylogenetics,” Bioinformatics, 25, 1370–1376.10.1093/bioinformatics/btp244Search in Google Scholar PubMed PubMed Central

Suchard, M., Q. Wang, C. Chan, J. Frelinger, A. Cron and M. West (2010) “Understanding GPU programming for statistical computation: studies in massively parallel massive mixtures,” J. Comput. Graph. Stat., 19, 419–438.10.1198/jcgs.2010.10016Search in Google Scholar PubMed PubMed Central

Swofford, D. (2002) “PAUP*: phylogenetic analysis using parsimony (*and other methods), version 4.b10,” Sinauer Associates, Inc.Search in Google Scholar

Yang, Z. and B. Rannala (1997) “Bayesian phylogenetic inference using DNA sequences: A Markov chain Monte Carlo method,” Mol. Biol. Evol., 14, 717–724.10.1093/oxfordjournals.molbev.a025811Search in Google Scholar PubMed

Zwickl, D. (2006) “Genetic algorithm approaches for the phylogenetic analysis of large biological sequence datasets under the maximum likelihood criterion,” Ph.D. Thesis, The University of Texas at Austin.Search in Google Scholar

Published Online: 2013-03-26

©2013 by Walter de Gruyter Berlin Boston

Downloaded on 10.12.2022 from https://www.degruyter.com/document/doi/10.1515/sagmb-2012-0023/html
Scroll Up Arrow