Skip to content
Licensed Unlicensed Requires Authentication Published by De Gruyter January 9, 2019

Sequence selection by FitSS4ASR alleviates ancestral sequence reconstruction as exemplified for geranylgeranylglyceryl phosphate synthase

  • Kristina Straub , Mona Linde , Cosimo Kropp , Samuel Blanquart , Patrick Babinger and Rainer Merkl ORCID logo EMAIL logo
From the journal Biological Chemistry


For evolutionary studies, but also for protein engineering, ancestral sequence reconstruction (ASR) has become an indispensable tool. The first step of every ASR protocol is the preparation of a representative sequence set containing at most a few hundred recent homologs whose composition determines decisively the outcome of a reconstruction. A common approach for sequence selection consists of several rounds of manual recompilation that is driven by embedded phylogenetic analyses of the varied sequence sets. For ASR of a geranylgeranylglyceryl phosphate synthase, we additionally utilized FitSS4ASR, which replaces this time-consuming protocol with an efficient and more rational approach. FitSS4ASR applies orthogonal filters to a set of homologs to eliminate outlier sequences and those bearing only a weak phylogenetic signal. To demonstrate the usefulness of FitSS4ASR, we determined experimentally the oligomerization state of eight predecessors, which is a delicate and taxon-specific property. Corresponding ancestors deduced in a manual approach and by means of FitSS4ASR had the same dimeric or hexameric conformation; this concordance testifies to the efficiency of FitSS4ASR for sequence selection. FitSS4ASR-based results of two other ASR experiments were added to the Supporting Information. Program and documentation are available at

Award Identifier / Grant number: ME2259/2-1

Funding statement: This work was supported by the Deutsche Forschungsgemeinschaft (funder id: 10.13039/501100001659, ME2259/2-1) and calculations were facilitated through the use of advanced computational infrastructure provided for account pr48fu by the Leibniz Supercomputing Center of the Bavarian Academy of Sciences and Humanities (funder id: 10.13039/501100007306).


Aberer, A.J., Krompass, D., and Stamatakis, A. (2013). Pruning rogue taxa improves phylogenetic accuracy: an efficient algorithm and webservice. Syst. Biol. 62, 162–166.10.1093/sysbio/sys078Search in Google Scholar PubMed PubMed Central

Akanuma, S., Nakajima, Y., Yokobori, S., Kimura, M., Nemoto, N., Mase, T., Miyazono, K., Tanokura, M., and Yamagishi, A. (2013). Experimental evidence for the thermophilicity of ancestral life. Proc. Natl. Acad. Sci. USA 110, 11067–11072.10.1073/pnas.1308215110Search in Google Scholar PubMed PubMed Central

Akiva, E., Copp, J.N., Tokuriki, N., and Babbitt, P.C. (2017). Evolutionary and molecular foundations of multiple contemporary functions of the nitroreductase superfamily. Proc. Natl. Acad. Sci. USA 114, E9549–E9558.10.1073/pnas.1706849114Search in Google Scholar PubMed PubMed Central

Alcolombri, U., Elias, M., and Tawfik, D.S. (2011). Directed evolution of sulfotransferases and paraoxonases by ancestral libraries. J. Mol. Biol. 411, 837–853.10.1016/j.jmb.2011.06.037Search in Google Scholar PubMed

Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D.J. (1997). Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402.10.1093/nar/25.17.3389Search in Google Scholar PubMed PubMed Central

Ashkenazy, H., Unger, R., and Kliger, Y. (2009). Optimal data collection for correlated mutation analysis. Proteins 74, 545–555.10.1002/prot.22168Search in Google Scholar PubMed

Ashkenazy, H., Penn, O., Doron-Faigenboim, A., Cohen, O., Cannarozzi, G., Zomer, O., and Pupko, T. (2012). FastML: a web server for probabilistic reconstruction of ancestral sequences. Nucleic Acids Res. 40, W580–W584.10.1093/nar/gks498Search in Google Scholar PubMed PubMed Central

Bar-Rogovsky, H., Stern, A., Penn, O., Kobl, I., Pupko, T., and Tawfik, D.S. (2015). Assessing the prediction fidelity of ancestral reconstruction by a library approach. Protein. Eng. Des. Sel. 28, 507–518.10.1093/protein/gzv038Search in Google Scholar PubMed

Bergsten, J. (2005). A review of long-branch attraction. Cladistics 21, 163–193.10.1111/j.1096-0031.2005.00059.xSearch in Google Scholar PubMed

Boussau, B., Blanquart, S., Necsulea, A., Lartillot, N., and Gouy, M. (2008). Parallel adaptations to high temperatures in the Archaean eon. Nature 456, 942–945.10.1038/nature07393Search in Google Scholar PubMed

Brinkmann, H., Van der Giezen, M., Zhou, Y., De Raucourt, G.P., and Philippe, H. (2005). An empirical assessment of long-branch attraction artefacts in deep eukaryotic phylogenomics. Syst. Biol. 54, 743–757.10.1080/10635150500234609Search in Google Scholar

Castresana, J. (2000). Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. Mol. Biol. Evol. 17, 540–552.10.1093/oxfordjournals.molbev.a026334Search in Google Scholar

Chen, A., Zhang, D., and Poulter, C.D. (1993). (S)-geranylgeranylglyceryl phosphate synthase. Purification and characterization of the first pathway-specific enzyme in archaebacterial membrane lipid biosynthesis. J. Biol. Chem. 268, 21701–21705.10.1016/S0021-9258(20)80598-5Search in Google Scholar

Chen, F., Gaucher, E.A., Leal, N.A., Hutter, D., Havemann, S.A., Govindarajan, S., Ortlund, E.A., and Benner, S.A. (2010). Reconstructed evolutionary adaptive paths give polymerases accepting reversible terminators for sequencing and SNP detection. Proc. Natl. Acad. Sci. USA 107, 1948–1953.10.1073/pnas.0908463107Search in Google Scholar PubMed PubMed Central

de Vienne, D.M., Ollier, S., and Aguileta, G. (2012). Phylo-MCOA: a fast and efficient method to detect outlier genes and species in phylogenomics using multiple co-inertia analysis. Mol. Biol. Evol. 29, 1587–1598.10.1093/molbev/msr317Search in Google Scholar PubMed

Dereeper, A., Guignon, V., Blanc, G., Audic, S., Buffet, S., Chevenet, F., Dufayard, J.F., Guindon, S., Lefort, V., Lescot, M., et al. (2008). robust phylogenetic analysis for the non-specialist. Nucleic Acids Res. 36, W465–W469.10.1093/nar/gkn180Search in Google Scholar PubMed PubMed Central

Essoussi, N., Boujenfa, K., and Limam, M. (2008). A comparison of MSA tools. Bioinformation 2, 452–455.10.6026/97320630002452Search in Google Scholar PubMed PubMed Central

Field, S.F. and Matz, M.V. (2010). Retracing evolution of red fluorescence in GFP-like proteins from Faviina corals. Mol. Biol. Evol. 27, 225–233.10.1093/molbev/msp230Search in Google Scholar PubMed PubMed Central

Frickey, T. and Lupas, A.N. (2004). PhyloGenie: automated phylome generation and analysis. Nucleic Acids Res. 32, 5231–5238.10.1093/nar/gkh867Search in Google Scholar PubMed PubMed Central

Fuellen, G., Spitzer, M., Cullen, P., and Lorkowski, S. (2005). Correspondence of function and phylogeny of ABC proteins based on an automated analysis of 20 model protein data sets. Proteins 61, 888–899.10.1002/prot.20616Search in Google Scholar PubMed

Gasteiger, E., Hoogland, C., Gattiker, A., Duvaud, S., Wilkins, M.R., Appel, R.D., and Bairoch, A. (2005). Protein identification and analysis tools on the ExPASy Server. In: The Proteomics Protocols Handbook, J.M. Walker, ed. (Totowa, NJ: Humana Press), pp. 571–607.10.1385/1-59259-890-0:571Search in Google Scholar

Gumulya, Y. and Gillam, E.M. (2017). Exploring the past and the future of protein evolution with ancestral sequence reconstruction: the ‘retro’ approach to protein engineering. Biochem. J. 474, 1–19.10.1042/BCJ20160507Search in Google Scholar PubMed

Hanson-Smith, V., and Johnson, A. (2016). PhyloBot: A web portal for automated phylogenetics, ancestral sequence reconstruction, and exploration of mutational trajectories. PLoS Comp. Biol. 12, e1004976.10.1371/journal.pcbi.1004976Search in Google Scholar PubMed PubMed Central

Hanson-Smith, V., Kolaczkowski, B., and Thornton, J.W. (2010). Robustness of ancestral sequence reconstruction to phylogenetic uncertainty. Mol. Biol. Evol. 27, 1988–1999.10.1093/molbev/msq081Search in Google Scholar PubMed PubMed Central

Harms, M.J. and Thornton, J.W. (2010). Analyzing protein structure and function using ancestral gene reconstruction. Curr. Opin. Struct. Biol. 20, 360–366.10.1016/ in Google Scholar PubMed PubMed Central

Henikoff, S. and Henikoff, J.G. (1992). Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. USA 89, 10915–10919.10.1073/pnas.89.22.10915Search in Google Scholar PubMed PubMed Central

Ho, S.Y. and Jermiin, L. (2004). Tracing the decay of the historical signal in biological sequence data. Syst. Biol. 53, 623–637.10.1080/10635150490503035Search in Google Scholar PubMed

Hobbs, J.K., Shepherd, C., Saul, D.J., Demetras, N.J., Haaning, S., Monk, C.R., Daniel, R.M., and Arcus, V.L. (2012). On the origin and evolution of thermophily: reconstruction of functional precambrian enzymes from ancestors of Bacillus. Mol. Biol. Evol. 29, 825–835.10.1093/molbev/msr253Search in Google Scholar PubMed

Hochberg, G.K.A. and Thornton, J.W. (2017). Reconstructing ancient proteins to understand the causes of structure and function. Annu. Rev. Biophys. 46, 247–269.10.1146/annurev-biophys-070816-033631Search in Google Scholar PubMed PubMed Central

Holinski, A., Heyn, K., Merkl, R., and Sterner, R. (2017). Combining ancestral sequence reconstruction with protein design to identify an interface hotspot in a key metabolic enzyme complex. Proteins 85, 312–321.10.1002/prot.25225Search in Google Scholar PubMed

Hug, L.A., Baker, B.J., Anantharaman, K., Brown, C.T., Probst, A.J., Castelle, C.J., Butterfield, C.N., Hernsdorf, A.W., Amano, Y., Ise, K., et al. (2016). A new view of the tree of life. Nat. Microbiol. 1, 16048.10.1038/nmicrobiol.2016.48Search in Google Scholar PubMed

Joy, J.B., Liang, R.H., McCloskey, R.M., Nguyen, T., and Poon, A.F. (2016). Ancestral reconstruction. PLoS Comp. Biol. 12, e1004763.10.1371/journal.pcbi.1004763Search in Google Scholar PubMed PubMed Central

Katoh, K. and Standley, D.M. (2013). MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol. Biol. Evol. 30, 772–780.10.1093/molbev/mst010Search in Google Scholar PubMed PubMed Central

Kumar, S., Stecher, G., Peterson, D., and Tamura, K. (2012). MEGA-CC: computing core of molecular evolutionary genetics analysis program for automated and iterative data analysis. Bioinformatics 28, 2685–2686.10.1093/bioinformatics/bts507Search in Google Scholar PubMed PubMed Central

Kupczok, A. (2011). Split-based computation of majority-rule supertrees. BMC Evol. Biol. 11, 205.10.1186/1471-2148-11-205Search in Google Scholar PubMed PubMed Central

Lartillot, N. and Philippe, H. (2004). A Bayesian mixture model for across-site heterogeneities in the amino-acid replacement process. Mol. Biol. Evol. 21, 1095–1109.10.1093/molbev/msh112Search in Google Scholar PubMed

Lartillot, N., Lepage, T., and Blanquart, S. (2009). PhyloBayes 3: a Bayesian software package for phylogenetic reconstruction and molecular dating. Bioinformatics 25, 2286–2288.10.1093/bioinformatics/btp368Search in Google Scholar PubMed

Le, Q., Sievers, F., and Higgins, D.G. (2017). Protein multiple sequence alignment benchmarking through secondary structure prediction. Bioinformatics 33, 1331–1337.10.1093/bioinformatics/btw840Search in Google Scholar PubMed PubMed Central

Lemoine, F., Domelevo Entfellner, J.B., Wilkinson, E., Correia, D., Davila Felipe, M., De Oliveira, T., and Gascuel, O. (2018). Renewing Felsenstein’s phylogenetic bootstrap in the era of big data. Nature 556, 452–456.10.1038/s41586-018-0043-0Search in Google Scholar PubMed PubMed Central

Li, W. and Godzik, A. (2006). Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22, 1658–1659.10.1093/bioinformatics/btl158Search in Google Scholar PubMed

Li, G., Steel, M., and Zhang, L. (2008). More taxa are not necessarily better for the reconstruction of ancestral character states. Syst. Biol. 57, 647–653.10.1080/10635150802203898Search in Google Scholar PubMed

Liberles, D.A. (2007). Ancestral Sequence Reconstruction (Oxford: Oxford University Press).10.1093/acprof:oso/9780199299188.001.0001Search in Google Scholar

Linde, M., Heyn, K., Merkl, R., Sterner, R., and Babinger, P. (2018). Hexamerization of geranylgeranylglyceryl phosphate synthase ensures structural integrity and catalytic activity at high temperatures. Biochemistry 57, 2335–2348.10.1021/acs.biochem.7b01284Search in Google Scholar PubMed

Litsios, G. and Salamin, N. (2012). Effects of phylogenetic signal on ancestral state reconstruction. Syst. Biol. 61, 533–538.10.1093/sysbio/syr124Search in Google Scholar PubMed

Löytynoja, A. and Goldman, N. (2008). Phylogeny-aware gap placement prevents errors in sequence alignment and evolutionary analysis. Science 320, 1632–1635.10.1126/science.1158395Search in Google Scholar PubMed

Merkl, R. and Sterner, R. (2016). Ancestral protein reconstruction: techniques and applications. Biol. Chem. 397, 1–21.10.1515/hsz-2015-0158Search in Google Scholar PubMed

Mitchell, A., Chang, H.Y., Daugherty, L., Fraser, M., Hunter, S., Lopez, R., McAnulla, C., McMenamin, C., Nuka, G., Pesseat, S., et al. (2015). The InterPro protein families database: the classification resource after 15 years. Nucleic Acids Res. 43, D213–D221.10.1093/nar/gku1243Search in Google Scholar PubMed PubMed Central

Monit, C. and Goldstein, R.A. (2018). SubRecon: ancestral reconstruction of amino acid substitutions along a branch in a phylogeny. Bioinformatics 1, 3.10.1093/bioinformatics/bty101Search in Google Scholar PubMed PubMed Central

Nisbet, E.G. and Sleep, N.H. (2001). The habitat and nature of early life. Nature 409, 1083–1091.10.1038/35059210Search in Google Scholar PubMed

Ochman, H., Lawrence, J.G., and Groisman, E.A. (2000). Lateral gene transfer and the nature of bacterial innovation. Nature 405, 299–304.10.1038/35012500Search in Google Scholar PubMed

Ortlund, E.A., Bridgham, J.T., Redinbo, M.R., and Thornton, J.W. (2007). Crystal structure of an ancient protein: evolution by conformational epistasis. Science 317, 1544–1548.10.1126/science.1142819Search in Google Scholar PubMed PubMed Central

Pagel, M., Meade, A., and Barker, D. (2004). Bayesian estimation of ancestral character states on phylogenies. Syst. Biol. 53, 673–684.10.1080/10635150490522232Search in Google Scholar PubMed

Payandeh, J. and Pai, E.F. (2007). Enzyme-driven speciation: crystallizing Archaea via lipid capture. J. Mol. Evol. 64, 364–374.10.1007/s00239-006-0141-8Search in Google Scholar PubMed

Perez-Jimenez, R., Inglés-Prieto, A., Zhao, Z.M., Sanchez-Romero, I., Alegre-Cebollada, J., Kosuri, P., Garcia-Manyes, S., Kappock, T.J., Tanokura, M., Holmgren, A., et al. (2011). Single-molecule paleoenzymology probes the chemistry of resurrected enzymes. Nat. Struct. Mol. Biol. 18, 592–596.10.1038/nsmb.2020Search in Google Scholar PubMed PubMed Central

Peterhoff, D., Beer, B., Rajendran, C., Kumpula, E.P., Kapetaniou, E., Guldan, H., Wierenga, R.K., Sterner, R., and Babinger, P. (2014). A comprehensive analysis of the geranylgeranylglyceryl phosphate synthase enzyme family identifies novel members and reveals mechanisms of substrate specificity and quaternary structure organization. Mol. Microbiol. 92, 885–899.10.1111/mmi.12596Search in Google Scholar PubMed

Pürzer, A., Grassmann, F., Birzer, D., and Merkl, R. (2011). Key2Ann: a tool to process sequence sets by replacing database identifiers with a human-readable annotation. J. Integr. Bioinform. 8, 153.10.1515/jib-2011-153Search in Google Scholar

Reisinger, B., Sperl, J., Holinski, A., Schmid, V., Rajendran, C., Carstensen, L., Schlee, S., Blanquart, S., Merkl, R., and Sterner, R. (2014). Evidence for the existence of elaborate enzyme complexes in the Paleoarchean era. J. Am. Chem. Soc. 136, 122–129.10.1021/ja4115677Search in Google Scholar PubMed

Richter, M., Bosnali, M., Carstensen, L., Seitz, T., Durchschlag, H., Blanquart, S., Merkl, R., and Sterner, R. (2010). Computational and experimental evidence for the evolution of a (βα)8-barrel protein from an ancestral quarter-barrel stabilised by disulfide bonds. J. Mol. Biol. 398, 763–773.10.1016/j.jmb.2010.03.057Search in Google Scholar PubMed

Rivera-Rivera, C.J. and Montoya-Burgos, J.I. (2016). LS3: a method for improving phylogenomic inferences when evolutionary rates are heterogeneous among taxa. Mol. Biol. Evol. 33, 1625–1634.10.1093/molbev/msw043Search in Google Scholar PubMed PubMed Central

Rodríguez-Ezpeleta, N., Brinkmann, H., Roure, B., Lartillot, N., Lang, B.F., and Philippe, H. (2007). Detecting and overcoming systematic errors in genome-scale phylogenies. Syst. Biol. 56, 389–399.10.1080/10635150701397643Search in Google Scholar PubMed

Rohweder, B., Semmelmann, F., Endres, C., and Sterner, R. (2018). Standardized cloning vectors for protein production and generation of large gene libraries in Escherichia coli. BioTechniques 64, 24–26.10.2144/000114628Search in Google Scholar PubMed

Ronquist, F. and Huelsenbeck, J.P. (2003). MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics 19, 1572–1574.10.1093/bioinformatics/btg180Search in Google Scholar PubMed

Salichos, L. and Rokas, A. (2013). Inferring ancient divergences requires genes with strong phylogenetic signals. Nature 497, 327–331.10.1038/nature12130Search in Google Scholar PubMed

Sanderson, M.J. and Shaffer, H.B. (2002). Troubleshooting molecular phylogenetic analyses. Annu. Rev. Ecol. Syst. 33, 49–72.10.1146/annurev.ecolsys.33.010802.150509Search in Google Scholar

Soltis, P.S. and Soltis, D.E. (2003). Applying the bootstrap in phylogeny reconstruction. Statist. Sci. 256–267.10.1214/ss/1063994980Search in Google Scholar

Stamatakis, A. (2006). RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics 22, 2688–2690.10.1093/bioinformatics/btl446Search in Google Scholar PubMed

Stefanović, S., Rice, D.W., and Palmer, J.D. (2004). Long branch attraction, taxon sampling, and the earliest angiosperms: amborella or monocots? BMC Evol. Biol. 4, 35.10.1186/1471-2148-4-35Search in Google Scholar PubMed PubMed Central

Straub, K. and Merkl, R. (2019). Ancestral sequence reconstruction as a tool for the elucidation of a stepwise evolutionary adaptation. In: Computational Methods in Protein Evolution, T. Sikosek, ed. (New York, NY: Humana Press), pp. 171–182.10.1007/978-1-4939-8736-8_9Search in Google Scholar PubMed

Swofford, D.L., Olsen, G.J., Waddell, P.J., and Hillis, D.M. (1996). Phylogenetic inference. In: Molecular Systematics, D.M. Hillis, C. Moritz, B.K. Mable, eds. (Sunderland, MA: Sinauer and Associates), pp. 407–514.Search in Google Scholar

Talevich, E., Invergo, B.M., Cock, P.J., and Chapman, B.A. (2012). Bio.Phylo: a unified toolkit for processing, analyzing and visualizing phylogenetic trees in Biopython. BMC Bioinformatics 13, 209.10.1186/1471-2105-13-209Search in Google Scholar PubMed PubMed Central

Tamura, K., Peterson, D., Peterson, N., Stecher, G., Nei, M., and Kumar, S. (2011). MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. Mol. Biol. Evol. 28, 2731–2739.10.1093/molbev/msr121Search in Google Scholar PubMed PubMed Central

Thornton, J.W., Need, E., and Crews, D. (2003). Resurrecting the ancestral steroid receptor: ancient origin of estrogen signaling. Science 301, 1714–1717.10.1126/science.1086185Search in Google Scholar

Tokuriki, N., Stricher, F., Serrano, L., and Tawfik, D.S. (2008). How protein stability and new functions trade off. PLoS Comp. Biol. 4, e1000002.10.1371/journal.pcbi.1000002Search in Google Scholar

Vialle, R.A., Tamuri, A.U., and Goldman, N. (2018). Alignment modulates ancestral sequence reconstruction accuracy. Mol. Biol. Evol. 37, 1783–1797.10.1093/molbev/msy055Search in Google Scholar

Waterhouse, A.M., Procter, J.B., Martin, D.M., Clamp, M., and Barton, G.J. (2009). Jalview Version 2–a multiple sequence alignment editor and analysis workbench. Bioinformatics 25, 1189–1191.10.1093/bioinformatics/btp033Search in Google Scholar

Wheeler, L.C., Lim, S.A., Marqusee, S., and Harms, M.J. (2016). The thermostability and specificity of ancient proteins. Curr. Opin. Struct. Biol. 38, 37–43.10.1016/ in Google Scholar

Wiens, J.J. (2005). Can incomplete taxa rescue phylogenetic analyses from long-branch attraction? Syst. Biol. 54, 731–742.10.1080/10635150500234583Search in Google Scholar

Wijma, H.J., Floor, R.J., and Janssen, D.B. (2013). Structure- and sequence-analysis inspired engineering of proteins for enhanced thermostability. Curr. Opin. Struct. Biol. 23, 588–594.10.1016/ in Google Scholar

Wilkinson, M. and Crotti, M. (2017). Comments on detecting rogue taxa using RogueNaRok. Syst. Biodivers. 15, 291–295.10.1080/14772000.2016.1252440Search in Google Scholar

Wouters, M.A., Liu, K., Riek, P., and Husain, A. (2003). A despecialization step underlying evolution of a family of serine proteases. Mol. Cell 12, 343–354.10.1016/S1097-2765(03)00308-3Search in Google Scholar

Supplementary Material

The online version of this article offers supplementary material (

Received: 2018-08-13
Accepted: 2018-12-07
Published Online: 2019-01-09
Published in Print: 2019-02-25

©2019 Walter de Gruyter GmbH, Berlin/Boston

Downloaded on 6.6.2023 from
Scroll to top button