Skip to content
Licensed Unlicensed Requires Authentication Published by De Gruyter March 30, 2011

A Variance-Components Model for Distance-Matrix Phylogenetic Reconstruction

  • Walter R Gilks , Tom M.W. Nye and Pietro Lio

Phylogenetic trees describe evolutionary relationships between related organisms (taxa). One approach to estimating phylogenetic trees supposes that a matrix of estimated evolutionary distances between taxa is available. Agglomerative methods have been proposed in which closely related taxon-pairs are successively combined to form ancestral taxa. Several of these computationally efficient agglomerative algorithms involve steps to reduce the variance in estimated distances. We propose an agglomerative phylogenetic method which focuses on statistical modeling of variance components in distance estimates. We consider how these variance components evolve during the agglomerative process. Our method simultaneously produces two topologically identical rooted trees, one tree having branch lengths proportional to elapsed time, and the other having branch lengths proportional to underlying evolutionary divergence. The method models two major sources of variation which have been separately discussed in the literature: noise, reflecting inaccuracies in measuring divergences, and distortion, reflecting randomness in the amounts of divergence in different parts of the tree. The methodology is based on successive hierarchical generalized least-squares regressions. It involves only means, variances and covariances of distance estimates, thereby avoiding full distributional assumptions. Exploitation of the algebraic structure of the estimation leads to an algorithm with computational complexity comparable to the leading published agglomerative methods. A parametric bootstrap procedure allows full uncertainty in the phylogenetic reconstruction to be assessed. Software implementing the methodology may be freely downloaded from StatTree.


Brodal, G., R. Faberberg, and P. C.N.S. (2004). Computing the quartet distance between evolutionary trees in time O(nlog2n). Algorithmica 38(2), 377–395.10.1007/s00453-003-1065-ySearch in Google Scholar

Bruno, W., N. Socci, and A. Halpern (2000). Weighted Neighbour Joining: a likelihood-based approach to distance-based phylogeny reconstruction. Molecular Biology and Evolution 17, 189–197.10.1093/oxfordjournals.molbev.a026231Search in Google Scholar PubMed

Bulmer, M. (1991). Use of the method of generalized least squ ares in reconstructing phylogenies from sequence data. Molecular Biology and Evolution 8, 868–883.Search in Google Scholar

Chakraborty, R. (1977). Estimation of time of divergence from phylogenetic studies. Canadian Journal of Genetics and Cytology 19, 217–223.10.1139/g77-024Search in Google Scholar PubMed

Crowder, M. (2001). On repeated measures analysis with misspecified covariance structure. Journal of the Royal Statistical Society, Series B 63, 55–62.10.1111/1467-9868.00275Search in Google Scholar

Desper, R. and O. Gascuel (2004). Theoretical foundation of the balanced minimum evolution method of phylogenetic inference and its relationship to weighted leastsquares tree fitting. Molecular Biology and Evolution 21(3), 587–598.10.1093/molbev/msh049Search in Google Scholar PubMed

Felsenstein, J. (1987). Estimation of hominoid phylogeny from a DNA hybridization data set. Journal of Molecular Evolution 26, 123–131.10.1007/BF02111286Search in Google Scholar PubMed

Felsenstein, J. (2004). Inferring Phylogenies. Massachusetts: Sinauer Associates, Inc.Search in Google Scholar

Gascuel, O. (1997). BIONJ: an improved version of the NJ algorithm based on a simple model of sequence data. Molecular Biology and Evolution 14, 685–695.10.1093/oxfordjournals.molbev.a025808Search in Google Scholar PubMed

Gascuel, O. (2000). Data model and classification by trees: the minimum variance reduction (MVR) method. Journal of Classification 17, 67–99.10.1007/s003570000005Search in Google Scholar

Golub, G. and C. Van Loan (1996). Matrix Computations (3rd ed.). Baltimore: The Johns Hopkins University Press.Search in Google Scholar

Hasegawa, M., H. Kishino, and T. Yano (1985). Dating the human-ape splitting by a molecular clock of mitochondrial DNA. Journal of Molecular Evolution 22, 160–174.10.1007/BF02101694Search in Google Scholar PubMed

Jukes, T. and C. Cantor (1969). Evolution of protein molecules. In M.N.Munro (Ed.), Mammalian Protein Metabolism, Volume III, pp. 21–132. New York: Academic Press.10.1016/B978-1-4832-3211-9.50009-7Search in Google Scholar

Keele, B., E. Giorgi, J. Salazar-Gonzalez, J. Decker, K. Pham, M. Salazar, et al. (2008). Identification and characterization of transmitted and early founder virus envelopes in primary HIV-1 infection. Proc. Natl. Acad. Sci. USA. 105, 75527557.10.1073/pnas.0802203105Search in Google Scholar

Kimura, M. (1980). A simple model for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. Journal of Molecular Evolution 16, 111–120.10.1007/BF01731581Search in Google Scholar

Kishino, H., J. Thorne, and W. Bruno (2001). Performance of a divergence time estimation method under a probabilistic model of rate evolution. Molecular Biology and Evolution 18, 352–361.10.1093/oxfordjournals.molbev.a003811Search in Google Scholar

Lanave, C., G. Preparata, C. Saccone, and G. Serio (1984). A new method for calculating evolutionary substitution rates. Journal of Molecular Evolution 20, 86–93.10.1007/BF02101990Search in Google Scholar

Mardia, K., J. Kent, and J. Bibby (1979). Multivariate Analysis. New York: Academic Press.Search in Google Scholar

Nei, M., J. Stephens, and N. Saitou (1985). Methods for computing the standard errors of branching points in an evolutionary tree and their applications to molecular data from human and apes. Molecular Biology and Evolution 2, 66–85.Search in Google Scholar

Rambaut, A. and N. Grassly (1997). Seq-gen: an application for the Monte Carlo simulation od DNA sequence evolution along phylogenetic trees. Algorithmica 13(3), 235–238.Search in Google Scholar

Robinson, D. and L. Foulds (1981). Comparison of phylogenetic trees. Mathematical Bioscience 53, 131–147.10.1016/0025-5564(81)90043-2Search in Google Scholar

Saitou, N. and M. Nei (1987). The neighbour-joining method: A new method for reconstructing phylogenetic trees. Molecular Biology and Evolution 4, 406–425.Search in Google Scholar

Salazar-Gonzalez, J., M. Salazar, B. Keele, G. Learn, E. Giorgi, H. Li, et al. (2009). Genetic identity, biological phenotype, and evolutionary pathways of transmitted/founder viruses in acute and early HIV-1 infection. J. Experimental Medecine 206(6), 1273–1289.10.1084/jem.20090378Search in Google Scholar PubMed PubMed Central

Studier, J. and K. Keppler (1988). A note on the neighbor-joining method of Saitou and Nei. Molecular Biology and Evolution 5, 729–731.Search in Google Scholar

Susko, E. (2003). Confidence regions and hypothesis tests using generalized least squares. Molecular Biology and Evolution 20, 862–868.10.1093/molbev/msg093Search in Google Scholar PubMed

Thorne, J., H. Kishino, and I. Painter (1998). Estimating the rate of evolution of the rate of molecular evolution. Molecular Biology and Evolution 15, 1647–1657.10.1093/oxfordjournals.molbev.a025892Search in Google Scholar PubMed

Wang, L.-S. and T. Warnow (2005). Distance-based genome rearrangement phylogeny. In O. Gascuel (Ed.), Mathematics of Evolution & Phylogeny, Chapter 13, pp. 353–383. Oxford University Press.Search in Google Scholar

Yang, Z. (1993). Maximum-likelihood estimation of phylogeny from DNA sequences when substitution rates differ over sites. Molecular Biology and Evolution 10, 1396–1401.Search in Google Scholar

Zwickl, D. and D. Hillis (2002). Increased taxon sampling greatly reduces phylogenetic error. Systematic Biology 51(4), 588–598.10.1080/10635150290102339Search in Google Scholar PubMed

Published Online: 2011-3-30

©2011 Walter de Gruyter GmbH & Co. KG, Berlin/Boston

Downloaded on 26.2.2024 from
Scroll to top button