Jump to ContentJump to Main Navigation
Show Summary Details
More options …

Statistical Applications in Genetics and Molecular Biology

Editor-in-Chief: Sanguinetti, Guido

6 Issues per year


IMPACT FACTOR 2017: 0.812
5-year IMPACT FACTOR: 1.104

CiteScore 2017: 0.86

SCImago Journal Rank (SJR) 2017: 0.456
Source Normalized Impact per Paper (SNIP) 2017: 0.527

Mathematical Citation Quotient (MCQ) 2017: 0.04

Online
ISSN
1544-6115
See all formats and pricing
More options …
Volume 12, Issue 1

Issues

Volume 10 (2011)

Volume 9 (2010)

Volume 6 (2007)

Volume 5 (2006)

Volume 4 (2005)

Volume 2 (2003)

Volume 1 (2002)

Monte Carlo estimation of total variation distance of Markov chains on large spaces, with application to phylogenetics

Radu Herbei / Laura Kubatko
Published Online: 2013-03-26 | DOI: https://doi.org/10.1515/sagmb-2012-0023

Abstract

Markov chains are widely used for modeling in many areas of molecular biology and genetics. As the complexity of such models advances, it becomes increasingly important to assess the rate at which a Markov chain converges to its stationary distribution in order to carry out accurate inference. A common measure of convergence to the stationary distribution is the total variation distance, but this measure can be difficult to compute when the state space of the chain is large. We propose a Monte Carlo method to estimate the total variation distance that can be applied in this situation, and we demonstrate how the method can be efficiently implemented by taking advantage of GPU computing techniques. We apply the method to two Markov chains on the space of phylogenetic trees, and discuss the implications of our findings for the development of algorithms for phylogenetic inference.

Keywords: GPU computing; mixing time; phylogenetics; total variation distance

References

  • Aldous, D. (2000) “Mixing time for a Markov chain on cladograms,” Comb. Probab. Comput., 9, 191–204.CrossrefGoogle Scholar

  • Aldous, D. (2012) URL http://www.stat.berkeley.edu/~aldous/Research/OP/clad-mix.pdf

  • Conger, M. and D. Viswanath (2006): “Shuffling cards for blackjack, bridge and other card games,” http://arxiv.org/abs/math/0606031.

  • Cron, A. and M. West (2011) “Efficient classification-based relabeling in mixture models,” The American Statistician, 65, 16–20.CrossrefWeb of ScienceGoogle Scholar

  • Diaconis, P. W. and S. P. Holmes (1998) “Matchings and phylogenetic trees,” PNAS, 95, 14600–14602.CrossrefGoogle Scholar

  • Guindon, S. and O. Gascuel (2003) “A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood,” Syst. Biol., 52, 696–704.CrossrefPubMedGoogle Scholar

  • L’Ecuyer, P., R. Simard, E. J. Chien and D. W. Kelton (2002) “An object-oriented random-number package with many long streams and substreams,” Oper. Res., 50, 1073–1075.CrossrefGoogle Scholar

  • Lee, L., C. Yau, M. B. Giles, A. Doucet and C. C. Homes (2010) “On the utility of graphics cards to perform massively parallel simulation of advanced Monte Carlo methods,” J. Comput. Graph. Stat., 19, 769–789.CrossrefWeb of ScienceGoogle Scholar

  • Levin, D. A., Y. Peres and E. L. Wilmer (2009) “Markov chains and mixing times,” American Mathematical Society.Google Scholar

  • Li, S., D. K. Pearl and H. Doss (2000) “Phylogenetic tree construction using Markov chain Monte Carlo,” J. Am. Stat. Assoc., 95, 493–508.CrossrefGoogle Scholar

  • Matsumoto, M. and T. Nishimura (1998) “Mersenne twister: a 623-dimensionally equidistributed uniform pseudo-random number generator,” ACM Transactions on Modeling and Computer Simulation, 8, 3–30.Google Scholar

  • Mossel, E. and E. Vigoda (2005) “Phylogenetic MCMC algorithms are misleading on mixtures of trees,” Science, 309, 2207–2209.CrossrefPubMedGoogle Scholar

  • NVIDIA (2012a) “CUDA C Programming Guide Version 4.2,” http://developer.download.nvidia.com/compute/DevZone/docs/html/C/doc/CUDA C Programming Guide.pdf.

  • NVIDIA (2012b) “CUDA Toolkit 4.2 CURAND Guide,” http://developer.download.nvidia.com/compute/DevZone/docs/html/CUDALibraries/doc/CURAND Library.pdf.

  • Randall, D. and P. Tetali (1999) “Analyzing glauber dynamics by comparison of Markov chains,” Journal of Mathematical Physics, 41, 1598–1615.Google Scholar

  • Ronquist, F., M. Teslenko, P. van der Mark, D. Ayres, A. Darling, S. Hohna, B. Larget, L. Liu, M. A. Suchard and J. P. Huelsenbeck (2012) “Mrbayes 3.2: Efficient Bayesian phylogenetic inference and model choice across a large model space,” Syst. Biol., 6, 539–542.Web of ScienceCrossrefGoogle Scholar

  • Salter, L. and D. K. Pearl (2001) “Stochastic search strategy for estimation of maximum likelihood phylogenetic trees,” Syst. Biol., 50, 7–17.CrossrefPubMedGoogle Scholar

  • Schweinsberg, J. (2002) “An O(n2) bound for the relaxation time of a Markov chain on cladograms,” Random Struct. Algor., 20, 59–70.CrossrefGoogle Scholar

  • Semple, C. and M. Steel (2003) Phylogenetics, Oxford University Press.Google Scholar

  • Spade, D., R. Herbei and L. Kubatko (2012) “A note on the relaxation time of two Markov chains on rooted phylogenetic tree spaces,” submitted (available upon request).Google Scholar

  • Stamatakis, A. (2006) “Maximum likelihood-based phylogenetic analysis with thousands of taxa and mixed models,” Bioinformatics, 4, 2688–2690.CrossrefGoogle Scholar

  • Suchard, M. A. and A. Rambaut (2009) “Many-core algorithms for statistical phylogenetics,” Bioinformatics, 25, 1370–1376.Web of ScienceCrossrefPubMedGoogle Scholar

  • Suchard, M., Q. Wang, C. Chan, J. Frelinger, A. Cron and M. West (2010) “Understanding GPU programming for statistical computation: studies in massively parallel massive mixtures,” J. Comput. Graph. Stat., 19, 419–438.Web of ScienceCrossrefGoogle Scholar

  • Swofford, D. (2002) “PAUP*: phylogenetic analysis using parsimony (*and other methods), version 4.b10,” Sinauer Associates, Inc.Google Scholar

  • Yang, Z. and B. Rannala (1997) “Bayesian phylogenetic inference using DNA sequences: A Markov chain Monte Carlo method,” Mol. Biol. Evol., 14, 717–724.CrossrefPubMedGoogle Scholar

  • Zwickl, D. (2006) “Genetic algorithm approaches for the phylogenetic analysis of large biological sequence datasets under the maximum likelihood criterion,” Ph.D. Thesis, The University of Texas at Austin.Google Scholar

About the article

Corresponding author: Laura Kubatko, The Ohio State University – Statistics, 404 Cockins Hall, 1958, Neil Avenue, Columbus, OH 43210, USA, Phone: +1-614-247-8846, Fax: +1-614-292-2096


Published Online: 2013-03-26


Citation Information: Statistical Applications in Genetics and Molecular Biology, Volume 12, Issue 1, Pages 39–48, ISSN (Online) 1544-6115, ISSN (Print) 2194-6302, DOI: https://doi.org/10.1515/sagmb-2012-0023.

Export Citation

©2013 by Walter de Gruyter Berlin Boston.Get Permission

Citing Articles

Here you can find all Crossref-listed publications in which this article is cited. If you would like to receive automatic email messages as soon as this article is cited in other publications, simply activate the “Citation Alert” on the top of this page.

[1]
David A. Spade, Radu Herbei, and Laura S. Kubatko
Statistics & Probability Letters, 2014, Volume 84, Page 247

Comments (0)

Please log in or register to comment.
Log in