Abstract
For evolutionary studies, but also for protein engineering, ancestral sequence reconstruction (ASR) has become an indispensable tool. The first step of every ASR protocol is the preparation of a representative sequence set containing at most a few hundred recent homologs whose composition determines decisively the outcome of a reconstruction. A common approach for sequence selection consists of several rounds of manual recompilation that is driven by embedded phylogenetic analyses of the varied sequence sets. For ASR of a geranylgeranylglyceryl phosphate synthase, we additionally utilized FitSS4ASR, which replaces this time-consuming protocol with an efficient and more rational approach. FitSS4ASR applies orthogonal filters to a set of homologs to eliminate outlier sequences and those bearing only a weak phylogenetic signal. To demonstrate the usefulness of FitSS4ASR, we determined experimentally the oligomerization state of eight predecessors, which is a delicate and taxon-specific property. Corresponding ancestors deduced in a manual approach and by means of FitSS4ASR had the same dimeric or hexameric conformation; this concordance testifies to the efficiency of FitSS4ASR for sequence selection. FitSS4ASR-based results of two other ASR experiments were added to the Supporting Information. Program and documentation are available at https://gitlab.bioinf.ur.de/hek61586/FitSS4ASR.
Funding source: Deutsche Forschungsgemeinschaft
Award Identifier / Grant number: ME2259/2-1
Funding statement: This work was supported by the Deutsche Forschungsgemeinschaft (funder id: 10.13039/501100001659, ME2259/2-1) and calculations were facilitated through the use of advanced computational infrastructure provided for account pr48fu by the Leibniz Supercomputing Center of the Bavarian Academy of Sciences and Humanities (funder id: 10.13039/501100007306).
References
Aberer, A.J., Krompass, D., and Stamatakis, A. (2013). Pruning rogue taxa improves phylogenetic accuracy: an efficient algorithm and webservice. Syst. Biol. 62, 162–166.10.1093/sysbio/sys078Search in Google Scholar PubMed PubMed Central
Akanuma, S., Nakajima, Y., Yokobori, S., Kimura, M., Nemoto, N., Mase, T., Miyazono, K., Tanokura, M., and Yamagishi, A. (2013). Experimental evidence for the thermophilicity of ancestral life. Proc. Natl. Acad. Sci. USA 110, 11067–11072.10.1073/pnas.1308215110Search in Google Scholar PubMed PubMed Central
Akiva, E., Copp, J.N., Tokuriki, N., and Babbitt, P.C. (2017). Evolutionary and molecular foundations of multiple contemporary functions of the nitroreductase superfamily. Proc. Natl. Acad. Sci. USA 114, E9549–E9558.10.1073/pnas.1706849114Search in Google Scholar PubMed PubMed Central
Alcolombri, U., Elias, M., and Tawfik, D.S. (2011). Directed evolution of sulfotransferases and paraoxonases by ancestral libraries. J. Mol. Biol. 411, 837–853.10.1016/j.jmb.2011.06.037Search in Google Scholar PubMed
Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D.J. (1997). Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402.10.1093/nar/25.17.3389Search in Google Scholar PubMed PubMed Central
Ashkenazy, H., Unger, R., and Kliger, Y. (2009). Optimal data collection for correlated mutation analysis. Proteins 74, 545–555.10.1002/prot.22168Search in Google Scholar PubMed
Ashkenazy, H., Penn, O., Doron-Faigenboim, A., Cohen, O., Cannarozzi, G., Zomer, O., and Pupko, T. (2012). FastML: a web server for probabilistic reconstruction of ancestral sequences. Nucleic Acids Res. 40, W580–W584.10.1093/nar/gks498Search in Google Scholar PubMed PubMed Central
Bar-Rogovsky, H., Stern, A., Penn, O., Kobl, I., Pupko, T., and Tawfik, D.S. (2015). Assessing the prediction fidelity of ancestral reconstruction by a library approach. Protein. Eng. Des. Sel. 28, 507–518.10.1093/protein/gzv038Search in Google Scholar PubMed
Bergsten, J. (2005). A review of long-branch attraction. Cladistics 21, 163–193.10.1111/j.1096-0031.2005.00059.xSearch in Google Scholar PubMed
Boussau, B., Blanquart, S., Necsulea, A., Lartillot, N., and Gouy, M. (2008). Parallel adaptations to high temperatures in the Archaean eon. Nature 456, 942–945.10.1038/nature07393Search in Google Scholar PubMed
Brinkmann, H., Van der Giezen, M., Zhou, Y., De Raucourt, G.P., and Philippe, H. (2005). An empirical assessment of long-branch attraction artefacts in deep eukaryotic phylogenomics. Syst. Biol. 54, 743–757.10.1080/10635150500234609Search in Google Scholar
Castresana, J. (2000). Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. Mol. Biol. Evol. 17, 540–552.10.1093/oxfordjournals.molbev.a026334Search in Google Scholar
Chen, A., Zhang, D., and Poulter, C.D. (1993). (S)-geranylgeranylglyceryl phosphate synthase. Purification and characterization of the first pathway-specific enzyme in archaebacterial membrane lipid biosynthesis. J. Biol. Chem. 268, 21701–21705.10.1016/S0021-9258(20)80598-5Search in Google Scholar
Chen, F., Gaucher, E.A., Leal, N.A., Hutter, D., Havemann, S.A., Govindarajan, S., Ortlund, E.A., and Benner, S.A. (2010). Reconstructed evolutionary adaptive paths give polymerases accepting reversible terminators for sequencing and SNP detection. Proc. Natl. Acad. Sci. USA 107, 1948–1953.10.1073/pnas.0908463107Search in Google Scholar PubMed PubMed Central
de Vienne, D.M., Ollier, S., and Aguileta, G. (2012). Phylo-MCOA: a fast and efficient method to detect outlier genes and species in phylogenomics using multiple co-inertia analysis. Mol. Biol. Evol. 29, 1587–1598.10.1093/molbev/msr317Search in Google Scholar PubMed
Dereeper, A., Guignon, V., Blanc, G., Audic, S., Buffet, S., Chevenet, F., Dufayard, J.F., Guindon, S., Lefort, V., Lescot, M., et al. (2008). Phylogeny.fr: robust phylogenetic analysis for the non-specialist. Nucleic Acids Res. 36, W465–W469.10.1093/nar/gkn180Search in Google Scholar PubMed PubMed Central
Essoussi, N., Boujenfa, K., and Limam, M. (2008). A comparison of MSA tools. Bioinformation 2, 452–455.10.6026/97320630002452Search in Google Scholar PubMed PubMed Central
Field, S.F. and Matz, M.V. (2010). Retracing evolution of red fluorescence in GFP-like proteins from Faviina corals. Mol. Biol. Evol. 27, 225–233.10.1093/molbev/msp230Search in Google Scholar PubMed PubMed Central
Frickey, T. and Lupas, A.N. (2004). PhyloGenie: automated phylome generation and analysis. Nucleic Acids Res. 32, 5231–5238.10.1093/nar/gkh867Search in Google Scholar PubMed PubMed Central
Fuellen, G., Spitzer, M., Cullen, P., and Lorkowski, S. (2005). Correspondence of function and phylogeny of ABC proteins based on an automated analysis of 20 model protein data sets. Proteins 61, 888–899.10.1002/prot.20616Search in Google Scholar PubMed
Gasteiger, E., Hoogland, C., Gattiker, A., Duvaud, S., Wilkins, M.R., Appel, R.D., and Bairoch, A. (2005). Protein identification and analysis tools on the ExPASy Server. In: The Proteomics Protocols Handbook, J.M. Walker, ed. (Totowa, NJ: Humana Press), pp. 571–607.10.1385/1-59259-890-0:571Search in Google Scholar
Gumulya, Y. and Gillam, E.M. (2017). Exploring the past and the future of protein evolution with ancestral sequence reconstruction: the ‘retro’ approach to protein engineering. Biochem. J. 474, 1–19.10.1042/BCJ20160507Search in Google Scholar PubMed
Hanson-Smith, V., and Johnson, A. (2016). PhyloBot: A web portal for automated phylogenetics, ancestral sequence reconstruction, and exploration of mutational trajectories. PLoS Comp. Biol. 12, e1004976.10.1371/journal.pcbi.1004976Search in Google Scholar PubMed PubMed Central
Hanson-Smith, V., Kolaczkowski, B., and Thornton, J.W. (2010). Robustness of ancestral sequence reconstruction to phylogenetic uncertainty. Mol. Biol. Evol. 27, 1988–1999.10.1093/molbev/msq081Search in Google Scholar PubMed PubMed Central
Harms, M.J. and Thornton, J.W. (2010). Analyzing protein structure and function using ancestral gene reconstruction. Curr. Opin. Struct. Biol. 20, 360–366.10.1016/j.sbi.2010.03.005Search in Google Scholar PubMed PubMed Central
Henikoff, S. and Henikoff, J.G. (1992). Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. USA 89, 10915–10919.10.1073/pnas.89.22.10915Search in Google Scholar PubMed PubMed Central
Ho, S.Y. and Jermiin, L. (2004). Tracing the decay of the historical signal in biological sequence data. Syst. Biol. 53, 623–637.10.1080/10635150490503035Search in Google Scholar PubMed
Hobbs, J.K., Shepherd, C., Saul, D.J., Demetras, N.J., Haaning, S., Monk, C.R., Daniel, R.M., and Arcus, V.L. (2012). On the origin and evolution of thermophily: reconstruction of functional precambrian enzymes from ancestors of Bacillus. Mol. Biol. Evol. 29, 825–835.10.1093/molbev/msr253Search in Google Scholar PubMed
Hochberg, G.K.A. and Thornton, J.W. (2017). Reconstructing ancient proteins to understand the causes of structure and function. Annu. Rev. Biophys. 46, 247–269.10.1146/annurev-biophys-070816-033631Search in Google Scholar PubMed PubMed Central
Holinski, A., Heyn, K., Merkl, R., and Sterner, R. (2017). Combining ancestral sequence reconstruction with protein design to identify an interface hotspot in a key metabolic enzyme complex. Proteins 85, 312–321.10.1002/prot.25225Search in Google Scholar PubMed
Hug, L.A., Baker, B.J., Anantharaman, K., Brown, C.T., Probst, A.J., Castelle, C.J., Butterfield, C.N., Hernsdorf, A.W., Amano, Y., Ise, K., et al. (2016). A new view of the tree of life. Nat. Microbiol. 1, 16048.10.1038/nmicrobiol.2016.48Search in Google Scholar PubMed
Joy, J.B., Liang, R.H., McCloskey, R.M., Nguyen, T., and Poon, A.F. (2016). Ancestral reconstruction. PLoS Comp. Biol. 12, e1004763.10.1371/journal.pcbi.1004763Search in Google Scholar PubMed PubMed Central
Katoh, K. and Standley, D.M. (2013). MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol. Biol. Evol. 30, 772–780.10.1093/molbev/mst010Search in Google Scholar PubMed PubMed Central
Kumar, S., Stecher, G., Peterson, D., and Tamura, K. (2012). MEGA-CC: computing core of molecular evolutionary genetics analysis program for automated and iterative data analysis. Bioinformatics 28, 2685–2686.10.1093/bioinformatics/bts507Search in Google Scholar PubMed PubMed Central
Kupczok, A. (2011). Split-based computation of majority-rule supertrees. BMC Evol. Biol. 11, 205.10.1186/1471-2148-11-205Search in Google Scholar PubMed PubMed Central
Lartillot, N. and Philippe, H. (2004). A Bayesian mixture model for across-site heterogeneities in the amino-acid replacement process. Mol. Biol. Evol. 21, 1095–1109.10.1093/molbev/msh112Search in Google Scholar PubMed
Lartillot, N., Lepage, T., and Blanquart, S. (2009). PhyloBayes 3: a Bayesian software package for phylogenetic reconstruction and molecular dating. Bioinformatics 25, 2286–2288.10.1093/bioinformatics/btp368Search in Google Scholar PubMed
Le, Q., Sievers, F., and Higgins, D.G. (2017). Protein multiple sequence alignment benchmarking through secondary structure prediction. Bioinformatics 33, 1331–1337.10.1093/bioinformatics/btw840Search in Google Scholar PubMed PubMed Central
Lemoine, F., Domelevo Entfellner, J.B., Wilkinson, E., Correia, D., Davila Felipe, M., De Oliveira, T., and Gascuel, O. (2018). Renewing Felsenstein’s phylogenetic bootstrap in the era of big data. Nature 556, 452–456.10.1038/s41586-018-0043-0Search in Google Scholar PubMed PubMed Central
Li, W. and Godzik, A. (2006). Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22, 1658–1659.10.1093/bioinformatics/btl158Search in Google Scholar PubMed
Li, G., Steel, M., and Zhang, L. (2008). More taxa are not necessarily better for the reconstruction of ancestral character states. Syst. Biol. 57, 647–653.10.1080/10635150802203898Search in Google Scholar PubMed
Liberles, D.A. (2007). Ancestral Sequence Reconstruction (Oxford: Oxford University Press).10.1093/acprof:oso/9780199299188.001.0001Search in Google Scholar
Linde, M., Heyn, K., Merkl, R., Sterner, R., and Babinger, P. (2018). Hexamerization of geranylgeranylglyceryl phosphate synthase ensures structural integrity and catalytic activity at high temperatures. Biochemistry 57, 2335–2348.10.1021/acs.biochem.7b01284Search in Google Scholar PubMed
Litsios, G. and Salamin, N. (2012). Effects of phylogenetic signal on ancestral state reconstruction. Syst. Biol. 61, 533–538.10.1093/sysbio/syr124Search in Google Scholar PubMed
Löytynoja, A. and Goldman, N. (2008). Phylogeny-aware gap placement prevents errors in sequence alignment and evolutionary analysis. Science 320, 1632–1635.10.1126/science.1158395Search in Google Scholar PubMed
Merkl, R. and Sterner, R. (2016). Ancestral protein reconstruction: techniques and applications. Biol. Chem. 397, 1–21.10.1515/hsz-2015-0158Search in Google Scholar PubMed
Mitchell, A., Chang, H.Y., Daugherty, L., Fraser, M., Hunter, S., Lopez, R., McAnulla, C., McMenamin, C., Nuka, G., Pesseat, S., et al. (2015). The InterPro protein families database: the classification resource after 15 years. Nucleic Acids Res. 43, D213–D221.10.1093/nar/gku1243Search in Google Scholar PubMed PubMed Central
Monit, C. and Goldstein, R.A. (2018). SubRecon: ancestral reconstruction of amino acid substitutions along a branch in a phylogeny. Bioinformatics 1, 3.10.1093/bioinformatics/bty101Search in Google Scholar PubMed PubMed Central
Nisbet, E.G. and Sleep, N.H. (2001). The habitat and nature of early life. Nature 409, 1083–1091.10.1038/35059210Search in Google Scholar PubMed
Ochman, H., Lawrence, J.G., and Groisman, E.A. (2000). Lateral gene transfer and the nature of bacterial innovation. Nature 405, 299–304.10.1038/35012500Search in Google Scholar PubMed
Ortlund, E.A., Bridgham, J.T., Redinbo, M.R., and Thornton, J.W. (2007). Crystal structure of an ancient protein: evolution by conformational epistasis. Science 317, 1544–1548.10.1126/science.1142819Search in Google Scholar PubMed PubMed Central
Pagel, M., Meade, A., and Barker, D. (2004). Bayesian estimation of ancestral character states on phylogenies. Syst. Biol. 53, 673–684.10.1080/10635150490522232Search in Google Scholar PubMed
Payandeh, J. and Pai, E.F. (2007). Enzyme-driven speciation: crystallizing Archaea via lipid capture. J. Mol. Evol. 64, 364–374.10.1007/s00239-006-0141-8Search in Google Scholar PubMed
Perez-Jimenez, R., Inglés-Prieto, A., Zhao, Z.M., Sanchez-Romero, I., Alegre-Cebollada, J., Kosuri, P., Garcia-Manyes, S., Kappock, T.J., Tanokura, M., Holmgren, A., et al. (2011). Single-molecule paleoenzymology probes the chemistry of resurrected enzymes. Nat. Struct. Mol. Biol. 18, 592–596.10.1038/nsmb.2020Search in Google Scholar PubMed PubMed Central
Peterhoff, D., Beer, B., Rajendran, C., Kumpula, E.P., Kapetaniou, E., Guldan, H., Wierenga, R.K., Sterner, R., and Babinger, P. (2014). A comprehensive analysis of the geranylgeranylglyceryl phosphate synthase enzyme family identifies novel members and reveals mechanisms of substrate specificity and quaternary structure organization. Mol. Microbiol. 92, 885–899.10.1111/mmi.12596Search in Google Scholar PubMed
Pürzer, A., Grassmann, F., Birzer, D., and Merkl, R. (2011). Key2Ann: a tool to process sequence sets by replacing database identifiers with a human-readable annotation. J. Integr. Bioinform. 8, 153.10.1515/jib-2011-153Search in Google Scholar
Reisinger, B., Sperl, J., Holinski, A., Schmid, V., Rajendran, C., Carstensen, L., Schlee, S., Blanquart, S., Merkl, R., and Sterner, R. (2014). Evidence for the existence of elaborate enzyme complexes in the Paleoarchean era. J. Am. Chem. Soc. 136, 122–129.10.1021/ja4115677Search in Google Scholar PubMed
Richter, M., Bosnali, M., Carstensen, L., Seitz, T., Durchschlag, H., Blanquart, S., Merkl, R., and Sterner, R. (2010). Computational and experimental evidence for the evolution of a (βα)8-barrel protein from an ancestral quarter-barrel stabilised by disulfide bonds. J. Mol. Biol. 398, 763–773.10.1016/j.jmb.2010.03.057Search in Google Scholar PubMed
Rivera-Rivera, C.J. and Montoya-Burgos, J.I. (2016). LS3: a method for improving phylogenomic inferences when evolutionary rates are heterogeneous among taxa. Mol. Biol. Evol. 33, 1625–1634.10.1093/molbev/msw043Search in Google Scholar PubMed PubMed Central
Rodríguez-Ezpeleta, N., Brinkmann, H., Roure, B., Lartillot, N., Lang, B.F., and Philippe, H. (2007). Detecting and overcoming systematic errors in genome-scale phylogenies. Syst. Biol. 56, 389–399.10.1080/10635150701397643Search in Google Scholar PubMed
Rohweder, B., Semmelmann, F., Endres, C., and Sterner, R. (2018). Standardized cloning vectors for protein production and generation of large gene libraries in Escherichia coli. BioTechniques 64, 24–26.10.2144/000114628Search in Google Scholar PubMed
Ronquist, F. and Huelsenbeck, J.P. (2003). MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics 19, 1572–1574.10.1093/bioinformatics/btg180Search in Google Scholar PubMed
Salichos, L. and Rokas, A. (2013). Inferring ancient divergences requires genes with strong phylogenetic signals. Nature 497, 327–331.10.1038/nature12130Search in Google Scholar PubMed
Sanderson, M.J. and Shaffer, H.B. (2002). Troubleshooting molecular phylogenetic analyses. Annu. Rev. Ecol. Syst. 33, 49–72.10.1146/annurev.ecolsys.33.010802.150509Search in Google Scholar
Soltis, P.S. and Soltis, D.E. (2003). Applying the bootstrap in phylogeny reconstruction. Statist. Sci. 256–267.10.1214/ss/1063994980Search in Google Scholar
Stamatakis, A. (2006). RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics 22, 2688–2690.10.1093/bioinformatics/btl446Search in Google Scholar PubMed
Stefanović, S., Rice, D.W., and Palmer, J.D. (2004). Long branch attraction, taxon sampling, and the earliest angiosperms: amborella or monocots? BMC Evol. Biol. 4, 35.10.1186/1471-2148-4-35Search in Google Scholar PubMed PubMed Central
Straub, K. and Merkl, R. (2019). Ancestral sequence reconstruction as a tool for the elucidation of a stepwise evolutionary adaptation. In: Computational Methods in Protein Evolution, T. Sikosek, ed. (New York, NY: Humana Press), pp. 171–182.10.1007/978-1-4939-8736-8_9Search in Google Scholar PubMed
Swofford, D.L., Olsen, G.J., Waddell, P.J., and Hillis, D.M. (1996). Phylogenetic inference. In: Molecular Systematics, D.M. Hillis, C. Moritz, B.K. Mable, eds. (Sunderland, MA: Sinauer and Associates), pp. 407–514.Search in Google Scholar
Talevich, E., Invergo, B.M., Cock, P.J., and Chapman, B.A. (2012). Bio.Phylo: a unified toolkit for processing, analyzing and visualizing phylogenetic trees in Biopython. BMC Bioinformatics 13, 209.10.1186/1471-2105-13-209Search in Google Scholar PubMed PubMed Central
Tamura, K., Peterson, D., Peterson, N., Stecher, G., Nei, M., and Kumar, S. (2011). MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. Mol. Biol. Evol. 28, 2731–2739.10.1093/molbev/msr121Search in Google Scholar PubMed PubMed Central
Thornton, J.W., Need, E., and Crews, D. (2003). Resurrecting the ancestral steroid receptor: ancient origin of estrogen signaling. Science 301, 1714–1717.10.1126/science.1086185Search in Google Scholar
Tokuriki, N., Stricher, F., Serrano, L., and Tawfik, D.S. (2008). How protein stability and new functions trade off. PLoS Comp. Biol. 4, e1000002.10.1371/journal.pcbi.1000002Search in Google Scholar
Vialle, R.A., Tamuri, A.U., and Goldman, N. (2018). Alignment modulates ancestral sequence reconstruction accuracy. Mol. Biol. Evol. 37, 1783–1797.10.1093/molbev/msy055Search in Google Scholar
Waterhouse, A.M., Procter, J.B., Martin, D.M., Clamp, M., and Barton, G.J. (2009). Jalview Version 2–a multiple sequence alignment editor and analysis workbench. Bioinformatics 25, 1189–1191.10.1093/bioinformatics/btp033Search in Google Scholar
Wheeler, L.C., Lim, S.A., Marqusee, S., and Harms, M.J. (2016). The thermostability and specificity of ancient proteins. Curr. Opin. Struct. Biol. 38, 37–43.10.1016/j.sbi.2016.05.015Search in Google Scholar
Wiens, J.J. (2005). Can incomplete taxa rescue phylogenetic analyses from long-branch attraction? Syst. Biol. 54, 731–742.10.1080/10635150500234583Search in Google Scholar
Wijma, H.J., Floor, R.J., and Janssen, D.B. (2013). Structure- and sequence-analysis inspired engineering of proteins for enhanced thermostability. Curr. Opin. Struct. Biol. 23, 588–594.10.1016/j.sbi.2013.04.008Search in Google Scholar
Wilkinson, M. and Crotti, M. (2017). Comments on detecting rogue taxa using RogueNaRok. Syst. Biodivers. 15, 291–295.10.1080/14772000.2016.1252440Search in Google Scholar
Wouters, M.A., Liu, K., Riek, P., and Husain, A. (2003). A despecialization step underlying evolution of a family of serine proteases. Mol. Cell 12, 343–354.10.1016/S1097-2765(03)00308-3Search in Google Scholar
Supplementary Material
The online version of this article offers supplementary material (https://doi.org/10.1515/hsz-2018-0344).
©2019 Walter de Gruyter GmbH, Berlin/Boston