Jump to ContentJump to Main Navigation
Show Summary Details
More options …

Bio-Algorithms and Med-Systems

Editor-in-Chief: Roterman-Konieczna , Irena

CiteScore 2018: 0.29

SCImago Journal Rank (SJR) 2018: 0.129
Source Normalized Impact per Paper (SNIP) 2018: 0.324

See all formats and pricing
More options …

Opportunities and limitations in applying coevolution-derived contacts to protein structure prediction

Stuart Tetchner / Tomasz KosciolekORCID iD: http://orcid.org/0000-0002-9915-7387 / David T. Jones
Published Online: 2014-11-27 | DOI: https://doi.org/10.1515/bams-2014-0013


The prospect of identifying contacts in protein structures purely from aligned protein sequences has lured researchers for a long time, but progress has been modest until recently. Here, we reviewed the most successful methods for identifying structural contacts from sequence and how these methods differ and made an initial assessment of the overlap of predicted contacts by alternative approaches. We then discussed the limitations of these methods and possibilities for future development and highlighted the recent applications of contacts in tertiary structure prediction, identifying the residues at the interfaces of protein-protein interactions, and the use of these methods in disentangling alternative conformational states. Finally, we identified the current challenges in the field of contact prediction, concentrating on the limitations imposed by available data, dependencies on the sequence alignments, and possible future developments.

Keywords: contact prediction; correlated mutation analysis; protein structure prediction


  • 1.

    Anfinsen CB. Principles that govern the folding of protein chains. Science 1973;181:223–30.CrossrefGoogle Scholar

  • 2.

    Gromiha MM, Selvaraj S. Inter-residue interactions in protein folding and stability. Prog Biophys Mol Biol 2004;86:235–77.CrossrefPubMedGoogle Scholar

  • 3.

    Chothia C, Lesk AM. The relation between the divergence of sequence and structure in proteins. EMBO J 1986;5:823.PubMedGoogle Scholar

  • 4.

    Rost B. Twilight zone of protein sequence alignments. Protein Eng 1999;12:85–94.PubMedCrossrefGoogle Scholar

  • 5.

    Vendruscolo M, Paci E, Dobson CM, Karplus M. Three key residues form a critical contact network in a protein folding transition state. Nature 2001;409:641–5.CrossrefGoogle Scholar

  • 6.

    Williams SG, Lovell SC. The effect of sequence evolution on protein structural divergence. Mol Biol Evol 2009;26:1055–65.PubMedCrossrefGoogle Scholar

  • 7.

    Poon A, Chao L. The rate of compensatory mutation in the DNA bacteriophage φX174. Genetics 2005;170:989–99.CrossrefGoogle Scholar

  • 8.

    Goh C-S, Bogan AA, Joachimiak M, Walther D, Cohen FE. Co-evolution of proteins with their interaction partners. J Mol Biol 2000;299:283–93.PubMedCrossrefGoogle Scholar

  • 9.

    Altschuh D, Lesk A, Bloomer A, Klug A. Correlation of co-ordinated amino acid substitutions with function in viruses related to tobacco mosaic virus. J Mol Biol 1987;193:693–707.PubMedCrossrefGoogle Scholar

  • 10.

    Göbel U, Sander C, Schneider R, Valencia A. Correlated mutations and residue contacts in proteins. Prot Struct Funct Bioinf 1994;18:309–17.CrossrefGoogle Scholar

  • 11.

    Vernet T, Tessier DC, Khouri HE, Altschuh D. Correlation of co-ordinated amino acid changes at the two-domain interface of cysteine proteases with protein stability. J Mol Biol 1992;224:501–9.CrossrefPubMedGoogle Scholar

  • 12.

    Neher E. How frequent are correlated changes in families of protein sequences? Proc Natl Acad Sci 1994;91:98–102.CrossrefGoogle Scholar

  • 13.

    Jones DT, Buchan DW, Cozzetto D, Pontil M. PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments. Bioinformatics 2012;28:184–90.CrossrefPubMedGoogle Scholar

  • 14.

    Lapedes AS, Giraud BG, Liu L, Stormo GD. Correlated mutations in models of protein sequences: phylogenetic and structural effects. Lecture Notes Monogr Ser 1999:236–56.CrossrefGoogle Scholar

  • 15.

    Pollock D, Taylor W. Effectiveness of correlation analysis in identifying protein residues undergoing correlated evolution. Protein Eng 1997;10:647–57.PubMedCrossrefGoogle Scholar

  • 16.

    Dunn SD, Wahl LM, Gloor GB. Mutual information without the influence of phylogeny or entropy dramatically improves residue contact prediction. Bioinformatics 2008;24:333–40.CrossrefPubMedGoogle Scholar

  • 17.

    Little DY, Chen L. Identification of coevolving residues and coevolution potentials emphasizing structure, bond formation and catalytic coordination in protein evolution. PLoS One 2009;4:e4762.CrossrefGoogle Scholar

  • 18.

    Gloor GB, Tyagi G, Abrassart DM, Kingston AJ, Fernandes AD, Dunn SD, et al. Functionally compensating coevolving positions are neither homoplasic nor conserved in clades. Mol Biol Evol 2010;27:1181–91.CrossrefPubMedGoogle Scholar

  • 19.

    Giraud B, Heumann JM, Lapedes AS. Superadditive correlation. Phys Rev E 1999;59:4983.CrossrefGoogle Scholar

  • 20.

    de Juan D, Pazos F, Valencia A. Emerging methods in protein co-evolution. Nat Rev Genet 2013;14:249–61.CrossrefGoogle Scholar

  • 21.

    Taylor WR, Hamilton RS, Sadowski MI. Prediction of contacts from correlated sequence substitutions. Curr Opin Struct Biol 2013;23:473–9.CrossrefPubMedGoogle Scholar

  • 22.

    Korber B, Farber RM, Wolpert DH, Lapedes AS. Covariation of mutations in the V3 loop of human immunodeficiency virus type 1 envelope protein: an information theoretic analysis. Proc Natl Acad Sci 1993;90:7176–80.Google Scholar

  • 23.

    Shindyalov I, Kolchanov N, Sander C. Can three-dimensional contacts in protein structures be predicted by analysis of correlated mutations? Protein Eng 1994;7:349–58.CrossrefPubMedGoogle Scholar

  • 24.

    Taylor WR, Hatrick K. Compensating changes in protein multiple sequence alignments. Protein Eng 1994;7:341–8.CrossrefPubMedGoogle Scholar

  • 25.

    Benner SA, Gerloff D. Patterns of divergence in homologous proteins as indicators of secondary and tertiary structure: a prediction of the structure of the catalytic domain of protein kinases. Adv Enzyme Regul 1991;31:121–81.PubMedCrossrefGoogle Scholar

  • 26.

    Weigt M, White RA, Szurmant H, Hoch JA, Hwa T. Identification of direct residue contacts in protein-protein interaction by message passing. Proc Natl Acad Sci 2009;106:67–72.CrossrefGoogle Scholar

  • 27.

    Marks DS, Colwell LJ, Sheridan R, Hopf TA, Pagnani A, Zecchina R, et al. Protein 3D structure computed from evolutionary sequence variation. PLoS One 2011;6:e28766.CrossrefGoogle Scholar

  • 28.

    Morcos F, Pagnani A, Lunt B, Bertolino A, Marks DS, Sander C, et al. Direct-coupling analysis of residue coevolution captures native contacts across many protein families. Proc Natl Acad Sci 2011;108:E1293–301.CrossrefGoogle Scholar

  • 29.

    Feinauer C, Skwark MJ, Pagnani A, Aurell E. Improving contact prediction along three dimensions. PLoS Comput Biol 2014;10: e1003847.CrossrefGoogle Scholar

  • 30.

    Ekeberg M, Hartonen T, Aurell E. Fast pseudolikelihood maximization for direct-coupling analysis of protein structure from many homologous amino-acid sequences. J Comput Phys 2014;276:341–56.CrossrefGoogle Scholar

  • 31.

    Ekeberg M, Lövkvist C, Lan Y, Weigt M, Aurell E. Improved contact prediction in proteins: using pseudolikelihoods to infer Potts models. Phys Rev E 2013;87:012707.CrossrefGoogle Scholar

  • 32.

    Kamisetty H, Ovchinnikov S, Baker D. Assessing the utility of coevolution-based residue-residue contact predictions in a sequence- and structure-rich era. Proc Natl Acad Sci 2013;110:15674–9.CrossrefGoogle Scholar

  • 33.

    Andreatta M, Laplagne S, Li SC, Smale S. Prediction of residue-residue contacts from protein families using similarity kernels and least squares regularization. arXiv preprint arXiv:13111301. 2014.Google Scholar

  • 34.

    Clark GW, Ackerman SH, Tillier ER, Gatti DL. Multidimensional mutual information methods for the analysis of covariation in multiple sequence alignments. BMC Bioinformatics 2014;15:157.CrossrefPubMedGoogle Scholar

  • 35.

    Baldassi C, Zamparo M, Feinauer C, Procaccini A, Zecchina R, Weigt M, et al. Fast and accurate multivariate Gaussian modeling of protein families: predicting residue contacts and protein-interaction partners. PLoS One 2014;9:e92721.CrossrefGoogle Scholar

  • 36.

    Friedman J, Hastie T, Tibshirani R. Sparse inverse covariance estimation with the graphical LASSO. Biostatistics 2008;9:432–41.PubMedCrossrefGoogle Scholar

  • 37.

    Banerjee O, El Ghaoui L, d’Aspremont A. Model selection through sparse maximum likelihood estimation for multivariate Gaussian or binary data. J Mach Learn Res 2008;9:485–516.Google Scholar

  • 38.

    Kaján L, Hopf TA, Marks DS, Rost B. FreeContact: fast and free software for protein contact prediction from residue co-evolution. BMC Bioinformatics 2014;15:85.PubMedCrossrefGoogle Scholar

  • 39.

    Skwark MJ, Abdel-Rehim A, Elofsson A. PconsC: combination of direct information methods and alignments improves contact prediction. Bioinformatics 2013;29:1815–6.CrossrefPubMedGoogle Scholar

  • 40.

    Hopf TA, Colwell LJ, Sheridan R, Rost B, Sander C, Marks DS. Three-dimensional structures of membrane proteins from genomic sequencing. Cell 2012;149:1607–21.CrossrefPubMedGoogle Scholar

  • 41.

    Sułkowska JI, Morcos F, Weigt M, Hwa T, Onuchic JN. Genomics-aided structure prediction. Proc Natl Acad Sci 2012;109:10340–5.CrossrefGoogle Scholar

  • 42.

    Kosciolek T, Jones DT. De novo structure prediction of globular proteins aided by sequence variation-derived contacts. PLoS One 2014;9:e92197.CrossrefGoogle Scholar

  • 43.

    Jones DT. Predicting novel protein folds by using FRAGFOLD. Prot Struct Funct Bioinf 2001;45:127–32.CrossrefGoogle Scholar

  • 44.

    Brunger AT, Adams PD, Clore GM, DeLano WL, Gros P, Grosse-Kunstleve RW, et al. Crystallography & NMR system: a new software suite for macromolecular structure determination. Acta Crystallogr Sect D Biol Crystallogr 1998;54:905–21.CrossrefGoogle Scholar

  • 45.

    Brunger AT. Version 1.2 of the crystallography and NMR system. Nat Prot 2007;2:2728–33.Google Scholar

  • 46.

    Vendruscolo M, Kussell E, Domany E. Recovery of protein structure from contact maps. Folding Des 1997;2:295–306.CrossrefGoogle Scholar

  • 47.

    Duarte JM, Sathyapriya R, Stehr H, Filippis I, Lappe M. Optimal contact definition for reconstruction of contact maps. BMC Bioinformatics 2010;11:283.PubMedCrossrefGoogle Scholar

  • 48.

    Kim DE, DiMaio F, Yu-Ruei Wang R, Song Y, Baker D. One contact for every twelve residues allows robust and accurate topology-level protein structure modeling. Prot Struct Funct Bioinf 2014;82:208–18.CrossrefGoogle Scholar

  • 49.

    Konopka BM, Ciombor M, Kurczynska M, Kotulska M. Automated procedure for contact-map-based protein structure reconstruction. J Membr Biol 2014;247:409–20.CrossrefPubMedGoogle Scholar

  • 50.

    Taylor TJ, Bai H, Tai CH, Lee B. Assessment of CASP10 contact-assisted predictions. Prot Struct Funct Bioinf 2014;82:84–97.CrossrefGoogle Scholar

  • 51.

    Tress ML, Valencia A. Predicted residue-residue contacts can help the scoring of 3D models. Prot Struct Funct Bioinf 2010;78:1980–91.Google Scholar

  • 52.

    Taylor WR, Jones DT, Sadowski MI. Protein topology from predicted residue contacts. Prot Sci 2012;21:299–305.CrossrefGoogle Scholar

  • 53.

    Savojardo C, Fariselli P, Martelli PL, Casadio R. BCov: a method for predicting β-sheet topology using sparse inverse covariance estimation and integer programming. Bioinformatics 2013:btt555.Google Scholar

  • 54.

    Sadowski MI. Prediction of protein domain boundaries from inverse covariances. Prot Struct Funct Bioinf 2013;81:253–60.CrossrefGoogle Scholar

  • 55.

    Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat T, Weissig H, et al. The protein data bank. Nucleic Acids Res 2000;28:235–42.PubMedCrossrefGoogle Scholar

  • 56.

    Janin J, Bahadur RP, Chakrabarti P. Protein-protein interaction and quaternary structure. Q Rev Biophys 2008;41:133–80.PubMedGoogle Scholar

  • 57.

    Hopf TA, Schärfe CP, Rodrigues JP, Green AG, Sander C, Bonvin AM, et al. Sequence co-evolution gives 3D contacts and structures of protein complexes. eLife 2014.PubMedGoogle Scholar

  • 58.

    Pazos F, Helmer-Citterich M, Ausiello G, Valencia A. Correlated mutations contain information about protein-protein interaction. J Mol Biol 1997;271:511–23.CrossrefPubMedGoogle Scholar

  • 59.

    Shih ES, Hwang MJ. On the use of distance constraints in protein-protein docking computations. Prot Struct Funct Bioinf 2012;80:194–205.CrossrefGoogle Scholar

  • 60.

    Stock AM, Robinson VL, Goudreau PN. Two-component signal transduction. Annu Rev Biochem 2000;69:183–215.PubMedCrossrefGoogle Scholar

  • 61.

    Cheng RR, Morcos F, Levine H, Onuchic JN. Toward rationally redesigning bacterial two-component signaling systems using coevolutionary information. Proc Natl Acad Sci 2014;111:E563–71.CrossrefGoogle Scholar

  • 62.

    Ovchinnikov S, Kamisetty H, Baker D. Robust and accurate prediction of residue-residue interactions across protein interfaces using evolutionary information. eLife 2014;3.CrossrefPubMedGoogle Scholar

  • 63.

    Butland G, Peregrín-Alvarez JM, Li J, Yang W, Yang X, Canadien V, et al. Interaction network containing conserved and essential protein complexes in Escherichia coli. Nature 2005;433:531–7.CrossrefPubMedGoogle Scholar

  • 64.

    Jeon J, Nam H-J, Choi YS, Yang J-S, Hwang J, Kim S. Molecular evolution of protein conformational changes revealed by a network of evolutionarily coupled residues. Mol Biol Evol 2011;28:2675–85.CrossrefGoogle Scholar

  • 65.

    Jana B, Morcos F, Onuchic JN. From structure to function: the convergence of structure based models and co-evolutionary information. Phys Chem Chem Phys 2014;16:6496–507.CrossrefGoogle Scholar

  • 66.

    Morcos F, Jana B, Hwa T, Onuchic JN. Coevolutionary signals across protein lineages help capture multiple protein conformations. Proc Natl Acad Sci 2013;110:20533–8.CrossrefGoogle Scholar

  • 67.

    Martin L, Gloor GB, Dunn S, Wahl LM. Using information theory to search for co-evolving residues in proteins. Bioinformatics 2005;21:4116–24.CrossrefGoogle Scholar

  • 68.

    Henikoff S, Henikoff JG. Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci 1992;89:10915–9.CrossrefGoogle Scholar

  • 69.

    Finn RD, Bateman A, Clements J, Coggill P, Eberhardt RY, Eddy SR, et al. Pfam: the protein families database. Nucleic Acids Res 2013:gkt1223.Google Scholar

  • 70.

    Remmert M, Biegert A, Hauser A, Söding J. HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment. Nat Methods 2012;9:173–5.Google Scholar

  • 71.

    Eddy SR. Profile hidden Markov models. Bioinformatics 1998;14:755–63.CrossrefGoogle Scholar

  • 72.

    Ma J, Wang S, Wang Z, Xu J. MRFalign: protein homology detection through alignment of Markov random fields. PLoS Comput Biol 2014;10:e1003500.CrossrefGoogle Scholar

  • 73.

    Gonzalez MW, Pearson WR. Homologous over-extension: a challenge for iterative similarity searches. Nucleic Acids Res 2010;38:2177–89.CrossrefPubMedGoogle Scholar

  • 74.

    Seemayer S, Gruber M, Söding J. CCMpred – fast and precise prediction of protein residue-residue contacts from correlated mutations. Bioinformatics 2014;30:3128–30.CrossrefGoogle Scholar

  • 75.

    Nugent T, Jones DT. Accurate de novo structure prediction of large transmembrane protein domains using fragment-assembly and correlated mutation analysis. Proc Natl Acad Sci 2012;109:E1540–7.CrossrefGoogle Scholar

  • 76.

    Michel M, Hayat S, Skwark MJ, Sander C, Marks DS, Elofsson A. PconsFold: improved contact predictions improve protein models. Bioinformatics 2014;30:i482–8.CrossrefGoogle Scholar

  • 77.

    Maynard Smith J. Natural selection and the concept of a protein space. Nature 1970;225:563–4.CrossrefGoogle Scholar

  • 78.

    Romero PA, Arnold FH. Exploring protein fitness landscapes by directed evolution. Nat Rev Mol Cell Biol 2009;10:866–76.CrossrefGoogle Scholar

About the article

Corresponding author: David T. Jones, Department of Computer Science, University College London, London WC1E 6BT, UK, E-mail:

Received: 2014-08-11

Accepted: 2014-10-15

Published Online: 2014-11-27

Published in Print: 2014-12-19

Citation Information: Bio-Algorithms and Med-Systems, Volume 10, Issue 4, Pages 243–254, ISSN (Online) 1896-530X, ISSN (Print) 1895-9091, DOI: https://doi.org/10.1515/bams-2014-0013.

Export Citation

©2014 by De Gruyter.Get Permission

Citing Articles

Here you can find all Crossref-listed publications in which this article is cited. If you would like to receive automatic email messages as soon as this article is cited in other publications, simply activate the “Citation Alert” on the top of this page.

Michael Schmidt and Kay Hamacher
Physical Review E, 2017, Volume 96, Number 5
Erik Aurell and Andrea Pagnani
PLOS Computational Biology, 2016, Volume 12, Number 5, Page e1004777
Tomasz Kosciolek and David T. Jones
Proteins: Structure, Function, and Bioinformatics, 2016, Volume 84, Page 145

Comments (0)

Please log in or register to comment.
Log in