Skip to content
Licensed Unlicensed Requires Authentication Published by De Gruyter November 27, 2014

Opportunities and limitations in applying coevolution-derived contacts to protein structure prediction

  • Stuart Tetchner , Tomasz Kosciolek ORCID logo and David T. Jones EMAIL logo

Abstract

The prospect of identifying contacts in protein structures purely from aligned protein sequences has lured researchers for a long time, but progress has been modest until recently. Here, we reviewed the most successful methods for identifying structural contacts from sequence and how these methods differ and made an initial assessment of the overlap of predicted contacts by alternative approaches. We then discussed the limitations of these methods and possibilities for future development and highlighted the recent applications of contacts in tertiary structure prediction, identifying the residues at the interfaces of protein-protein interactions, and the use of these methods in disentangling alternative conformational states. Finally, we identified the current challenges in the field of contact prediction, concentrating on the limitations imposed by available data, dependencies on the sequence alignments, and possible future developments.


Corresponding author: David T. Jones, Department of Computer Science, University College London, London WC1E 6BT, UK, E-mail:

Acknowledgments

The authors thank Domenico Cozzetto for useful discussions. ST and TK were supported by the Wellcome Trust (studentship numbers 096622/Z/11/Z and 096624/Z/11/Z, respectively).

Author contributions: All the authors have accepted responsibility for the entire content of this submitted manuscript and approved submission.

Research funding: None declared.

Employment or leadership: None declared.

Honorarium: None declared.

Competing interests: The funding organization(s) played no role in the study design; in the collection, analysis, and interpretation of data; in the writing of the report; or in the decision to submit the report for publication.

References

1. Anfinsen CB. Principles that govern the folding of protein chains. Science 1973;181:223–30.10.1126/science.181.4096.223Search in Google Scholar

2. Gromiha MM, Selvaraj S. Inter-residue interactions in protein folding and stability. Prog Biophys Mol Biol 2004;86:235–77.10.1016/j.pbiomolbio.2003.09.003Search in Google Scholar

3. Chothia C, Lesk AM. The relation between the divergence of sequence and structure in proteins. EMBO J 1986;5:823.10.1002/j.1460-2075.1986.tb04288.xSearch in Google Scholar

4. Rost B. Twilight zone of protein sequence alignments. Protein Eng 1999;12:85–94.10.1093/protein/12.2.85Search in Google Scholar

5. Vendruscolo M, Paci E, Dobson CM, Karplus M. Three key residues form a critical contact network in a protein folding transition state. Nature 2001;409:641–5.10.1038/35054591Search in Google Scholar

6. Williams SG, Lovell SC. The effect of sequence evolution on protein structural divergence. Mol Biol Evol 2009;26:1055–65.10.1093/molbev/msp020Search in Google Scholar

7. Poon A, Chao L. The rate of compensatory mutation in the DNA bacteriophage φX174. Genetics 2005;170:989–99.10.1534/genetics.104.039438Search in Google Scholar

8. Goh C-S, Bogan AA, Joachimiak M, Walther D, Cohen FE. Co-evolution of proteins with their interaction partners. J Mol Biol 2000;299:283–93.10.1006/jmbi.2000.3732Search in Google Scholar

9. Altschuh D, Lesk A, Bloomer A, Klug A. Correlation of co-ordinated amino acid substitutions with function in viruses related to tobacco mosaic virus. J Mol Biol 1987;193:693–707.10.1016/0022-2836(87)90352-4Search in Google Scholar

10. Göbel U, Sander C, Schneider R, Valencia A. Correlated mutations and residue contacts in proteins. Prot Struct Funct Bioinf 1994;18:309–17.10.1002/prot.340180402Search in Google Scholar PubMed

11. Vernet T, Tessier DC, Khouri HE, Altschuh D. Correlation of co-ordinated amino acid changes at the two-domain interface of cysteine proteases with protein stability. J Mol Biol 1992;224:501–9.10.1016/0022-2836(92)91011-DSearch in Google Scholar

12. Neher E. How frequent are correlated changes in families of protein sequences? Proc Natl Acad Sci 1994;91:98–102.10.1073/pnas.91.1.98Search in Google Scholar PubMed PubMed Central

13. Jones DT, Buchan DW, Cozzetto D, Pontil M. PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments. Bioinformatics 2012;28:184–90.10.1093/bioinformatics/btr638Search in Google Scholar PubMed

14. Lapedes AS, Giraud BG, Liu L, Stormo GD. Correlated mutations in models of protein sequences: phylogenetic and structural effects. Lecture Notes Monogr Ser 1999:236–56.10.1214/lnms/1215455556Search in Google Scholar

15. Pollock D, Taylor W. Effectiveness of correlation analysis in identifying protein residues undergoing correlated evolution. Protein Eng 1997;10:647–57.10.1093/protein/10.6.647Search in Google Scholar PubMed

16. Dunn SD, Wahl LM, Gloor GB. Mutual information without the influence of phylogeny or entropy dramatically improves residue contact prediction. Bioinformatics 2008;24:333–40.10.1093/bioinformatics/btm604Search in Google Scholar PubMed

17. Little DY, Chen L. Identification of coevolving residues and coevolution potentials emphasizing structure, bond formation and catalytic coordination in protein evolution. PLoS One 2009;4:e4762.10.1371/journal.pone.0004762Search in Google Scholar PubMed PubMed Central

18. Gloor GB, Tyagi G, Abrassart DM, Kingston AJ, Fernandes AD, Dunn SD, et al. Functionally compensating coevolving positions are neither homoplasic nor conserved in clades. Mol Biol Evol 2010;27:1181–91.10.1093/molbev/msq004Search in Google Scholar PubMed

19. Giraud B, Heumann JM, Lapedes AS. Superadditive correlation. Phys Rev E 1999;59:4983.10.1103/PhysRevE.59.4983Search in Google Scholar

20. de Juan D, Pazos F, Valencia A. Emerging methods in protein co-evolution. Nat Rev Genet 2013;14:249–61.10.1038/nrg3414Search in Google Scholar PubMed

21. Taylor WR, Hamilton RS, Sadowski MI. Prediction of contacts from correlated sequence substitutions. Curr Opin Struct Biol 2013;23:473–9.10.1016/j.sbi.2013.04.001Search in Google Scholar

22. Korber B, Farber RM, Wolpert DH, Lapedes AS. Covariation of mutations in the V3 loop of human immunodeficiency virus type 1 envelope protein: an information theoretic analysis. Proc Natl Acad Sci 1993;90:7176–80.10.1073/pnas.90.15.7176Search in Google Scholar

23. Shindyalov I, Kolchanov N, Sander C. Can three-dimensional contacts in protein structures be predicted by analysis of correlated mutations? Protein Eng 1994;7:349–58.10.1093/protein/7.3.349Search in Google Scholar

24. Taylor WR, Hatrick K. Compensating changes in protein multiple sequence alignments. Protein Eng 1994;7:341–8.10.1093/protein/7.3.341Search in Google Scholar

25. Benner SA, Gerloff D. Patterns of divergence in homologous proteins as indicators of secondary and tertiary structure: a prediction of the structure of the catalytic domain of protein kinases. Adv Enzyme Regul 1991;31:121–81.10.1016/0065-2571(91)90012-BSearch in Google Scholar

26. Weigt M, White RA, Szurmant H, Hoch JA, Hwa T. Identification of direct residue contacts in protein-protein interaction by message passing. Proc Natl Acad Sci 2009;106:67–72.10.1073/pnas.0805923106Search in Google Scholar PubMed PubMed Central

27. Marks DS, Colwell LJ, Sheridan R, Hopf TA, Pagnani A, Zecchina R, et al. Protein 3D structure computed from evolutionary sequence variation. PLoS One 2011;6:e28766.10.1371/journal.pone.0028766Search in Google Scholar PubMed PubMed Central

28. Morcos F, Pagnani A, Lunt B, Bertolino A, Marks DS, Sander C, et al. Direct-coupling analysis of residue coevolution captures native contacts across many protein families. Proc Natl Acad Sci 2011;108:E1293–301.10.1073/pnas.1111471108Search in Google Scholar PubMed PubMed Central

29. Feinauer C, Skwark MJ, Pagnani A, Aurell E. Improving contact prediction along three dimensions. PLoS Comput Biol 2014;10: e1003847.10.1371/journal.pcbi.1003847Search in Google Scholar PubMed PubMed Central

30. Ekeberg M, Hartonen T, Aurell E. Fast pseudolikelihood maximization for direct-coupling analysis of protein structure from many homologous amino-acid sequences. J Comput Phys 2014;276:341–56.10.1016/j.jcp.2014.07.024Search in Google Scholar

31. Ekeberg M, Lövkvist C, Lan Y, Weigt M, Aurell E. Improved contact prediction in proteins: using pseudolikelihoods to infer Potts models. Phys Rev E 2013;87:012707.10.1103/PhysRevE.87.012707Search in Google Scholar PubMed

32. Kamisetty H, Ovchinnikov S, Baker D. Assessing the utility of coevolution-based residue-residue contact predictions in a sequence- and structure-rich era. Proc Natl Acad Sci 2013;110:15674–9.10.1073/pnas.1314045110Search in Google Scholar PubMed PubMed Central

33. Andreatta M, Laplagne S, Li SC, Smale S. Prediction of residue-residue contacts from protein families using similarity kernels and least squares regularization. arXiv preprint arXiv:13111301. 2014.Search in Google Scholar

34. Clark GW, Ackerman SH, Tillier ER, Gatti DL. Multidimensional mutual information methods for the analysis of covariation in multiple sequence alignments. BMC Bioinformatics 2014;15:157.10.1186/1471-2105-15-157Search in Google Scholar PubMed PubMed Central

35. Baldassi C, Zamparo M, Feinauer C, Procaccini A, Zecchina R, Weigt M, et al. Fast and accurate multivariate Gaussian modeling of protein families: predicting residue contacts and protein-interaction partners. PLoS One 2014;9:e92721.10.1371/journal.pone.0092721Search in Google Scholar PubMed PubMed Central

36. Friedman J, Hastie T, Tibshirani R. Sparse inverse covariance estimation with the graphical LASSO. Biostatistics 2008;9:432–41.10.1093/biostatistics/kxm045Search in Google Scholar PubMed PubMed Central

37. Banerjee O, El Ghaoui L, d’Aspremont A. Model selection through sparse maximum likelihood estimation for multivariate Gaussian or binary data. J Mach Learn Res 2008;9:485–516.Search in Google Scholar

38. Kaján L, Hopf TA, Marks DS, Rost B. FreeContact: fast and free software for protein contact prediction from residue co-evolution. BMC Bioinformatics 2014;15:85.10.1186/1471-2105-15-85Search in Google Scholar PubMed PubMed Central

39. Skwark MJ, Abdel-Rehim A, Elofsson A. PconsC: combination of direct information methods and alignments improves contact prediction. Bioinformatics 2013;29:1815–6.10.1093/bioinformatics/btt259Search in Google Scholar PubMed

40. Hopf TA, Colwell LJ, Sheridan R, Rost B, Sander C, Marks DS. Three-dimensional structures of membrane proteins from genomic sequencing. Cell 2012;149:1607–21.10.1016/j.cell.2012.04.012Search in Google Scholar PubMed PubMed Central

41. Sułkowska JI, Morcos F, Weigt M, Hwa T, Onuchic JN. Genomics-aided structure prediction. Proc Natl Acad Sci 2012;109:10340–5.10.1073/pnas.1207864109Search in Google Scholar PubMed PubMed Central

42. Kosciolek T, Jones DT. De novo structure prediction of globular proteins aided by sequence variation-derived contacts. PLoS One 2014;9:e92197.10.1371/journal.pone.0092197Search in Google Scholar PubMed PubMed Central

43. Jones DT. Predicting novel protein folds by using FRAGFOLD. Prot Struct Funct Bioinf 2001;45:127–32.10.1002/prot.1171Search in Google Scholar

44. Brunger AT, Adams PD, Clore GM, DeLano WL, Gros P, Grosse-Kunstleve RW, et al. Crystallography & NMR system: a new software suite for macromolecular structure determination. Acta Crystallogr Sect D Biol Crystallogr 1998;54:905–21.10.1107/S0907444998003254Search in Google Scholar

45. Brunger AT. Version 1.2 of the crystallography and NMR system. Nat Prot 2007;2:2728–33.10.1038/nprot.2007.406Search in Google Scholar

46. Vendruscolo M, Kussell E, Domany E. Recovery of protein structure from contact maps. Folding Des 1997;2:295–306.10.1016/S1359-0278(97)00041-2Search in Google Scholar

47. Duarte JM, Sathyapriya R, Stehr H, Filippis I, Lappe M. Optimal contact definition for reconstruction of contact maps. BMC Bioinformatics 2010;11:283.10.1186/1471-2105-11-283Search in Google Scholar PubMed PubMed Central

48. Kim DE, DiMaio F, Yu-Ruei Wang R, Song Y, Baker D. One contact for every twelve residues allows robust and accurate topology-level protein structure modeling. Prot Struct Funct Bioinf 2014;82:208–18.10.1002/prot.24374Search in Google Scholar PubMed PubMed Central

49. Konopka BM, Ciombor M, Kurczynska M, Kotulska M. Automated procedure for contact-map-based protein structure reconstruction. J Membr Biol 2014;247:409–20.10.1007/s00232-014-9648-xSearch in Google Scholar PubMed PubMed Central

50. Taylor TJ, Bai H, Tai CH, Lee B. Assessment of CASP10 contact-assisted predictions. Prot Struct Funct Bioinf 2014;82:84–97.10.1002/prot.24367Search in Google Scholar PubMed PubMed Central

51. Tress ML, Valencia A. Predicted residue-residue contacts can help the scoring of 3D models. Prot Struct Funct Bioinf 2010;78:1980–91.10.1002/prot.22714Search in Google Scholar PubMed

52. Taylor WR, Jones DT, Sadowski MI. Protein topology from predicted residue contacts. Prot Sci 2012;21:299–305.10.1002/pro.2002Search in Google Scholar PubMed PubMed Central

53. Savojardo C, Fariselli P, Martelli PL, Casadio R. BCov: a method for predicting β-sheet topology using sparse inverse covariance estimation and integer programming. Bioinformatics 2013:btt555.10.1093/bioinformatics/btt555Search in Google Scholar PubMed PubMed Central

54. Sadowski MI. Prediction of protein domain boundaries from inverse covariances. Prot Struct Funct Bioinf 2013;81:253–60.10.1002/prot.24181Search in Google Scholar PubMed PubMed Central

55. Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat T, Weissig H, et al. The protein data bank. Nucleic Acids Res 2000;28:235–42.10.1093/nar/28.1.235Search in Google Scholar PubMed PubMed Central

56. Janin J, Bahadur RP, Chakrabarti P. Protein-protein interaction and quaternary structure. Q Rev Biophys 2008;41:133–80.10.1017/S0033583508004708Search in Google Scholar PubMed

57. Hopf TA, Schärfe CP, Rodrigues JP, Green AG, Sander C, Bonvin AM, et al. Sequence co-evolution gives 3D contacts and structures of protein complexes. eLife 2014.10.1101/004762Search in Google Scholar

58. Pazos F, Helmer-Citterich M, Ausiello G, Valencia A. Correlated mutations contain information about protein-protein interaction. J Mol Biol 1997;271:511–23.10.1006/jmbi.1997.1198Search in Google Scholar PubMed

59. Shih ES, Hwang MJ. On the use of distance constraints in protein-protein docking computations. Prot Struct Funct Bioinf 2012;80:194–205.10.1002/prot.23179Search in Google Scholar PubMed

60. Stock AM, Robinson VL, Goudreau PN. Two-component signal transduction. Annu Rev Biochem 2000;69:183–215.10.1146/annurev.biochem.69.1.183Search in Google Scholar PubMed

61. Cheng RR, Morcos F, Levine H, Onuchic JN. Toward rationally redesigning bacterial two-component signaling systems using coevolutionary information. Proc Natl Acad Sci 2014;111:E563–71.10.1073/pnas.1323734111Search in Google Scholar PubMed PubMed Central

62. Ovchinnikov S, Kamisetty H, Baker D. Robust and accurate prediction of residue-residue interactions across protein interfaces using evolutionary information. eLife 2014;3.10.7554/eLife.02030Search in Google Scholar PubMed PubMed Central

63. Butland G, Peregrín-Alvarez JM, Li J, Yang W, Yang X, Canadien V, et al. Interaction network containing conserved and essential protein complexes in Escherichia coli. Nature 2005;433:531–7.10.1038/nature03239Search in Google Scholar PubMed

64. Jeon J, Nam H-J, Choi YS, Yang J-S, Hwang J, Kim S. Molecular evolution of protein conformational changes revealed by a network of evolutionarily coupled residues. Mol Biol Evol 2011;28:2675–85.10.1093/molbev/msr094Search in Google Scholar PubMed

65. Jana B, Morcos F, Onuchic JN. From structure to function: the convergence of structure based models and co-evolutionary information. Phys Chem Chem Phys 2014;16:6496–507.10.1039/C3CP55275FSearch in Google Scholar

66. Morcos F, Jana B, Hwa T, Onuchic JN. Coevolutionary signals across protein lineages help capture multiple protein conformations. Proc Natl Acad Sci 2013;110:20533–8.10.1073/pnas.1315625110Search in Google Scholar PubMed PubMed Central

67. Martin L, Gloor GB, Dunn S, Wahl LM. Using information theory to search for co-evolving residues in proteins. Bioinformatics 2005;21:4116–24.10.1093/bioinformatics/bti671Search in Google Scholar PubMed

68. Henikoff S, Henikoff JG. Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci 1992;89:10915–9.10.1073/pnas.89.22.10915Search in Google Scholar PubMed PubMed Central

69. Finn RD, Bateman A, Clements J, Coggill P, Eberhardt RY, Eddy SR, et al. Pfam: the protein families database. Nucleic Acids Res 2013:gkt1223.10.1093/nar/gkt1223Search in Google Scholar PubMed PubMed Central

70. Remmert M, Biegert A, Hauser A, Söding J. HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment. Nat Methods 2012;9:173–5.10.1038/nmeth.1818Search in Google Scholar PubMed

71. Eddy SR. Profile hidden Markov models. Bioinformatics 1998;14:755–63.10.1093/bioinformatics/14.9.755Search in Google Scholar PubMed

72. Ma J, Wang S, Wang Z, Xu J. MRFalign: protein homology detection through alignment of Markov random fields. PLoS Comput Biol 2014;10:e1003500.10.1371/journal.pcbi.1003500Search in Google Scholar PubMed PubMed Central

73. Gonzalez MW, Pearson WR. Homologous over-extension: a challenge for iterative similarity searches. Nucleic Acids Res 2010;38:2177–89.10.1093/nar/gkp1219Search in Google Scholar PubMed PubMed Central

74. Seemayer S, Gruber M, Söding J. CCMpred – fast and precise prediction of protein residue-residue contacts from correlated mutations. Bioinformatics 2014;30:3128–30.10.1093/bioinformatics/btu500Search in Google Scholar PubMed PubMed Central

75. Nugent T, Jones DT. Accurate de novo structure prediction of large transmembrane protein domains using fragment-assembly and correlated mutation analysis. Proc Natl Acad Sci 2012;109:E1540–7.10.1073/pnas.1120036109Search in Google Scholar PubMed PubMed Central

76. Michel M, Hayat S, Skwark MJ, Sander C, Marks DS, Elofsson A. PconsFold: improved contact predictions improve protein models. Bioinformatics 2014;30:i482–8.10.1093/bioinformatics/btu458Search in Google Scholar PubMed PubMed Central

77. Maynard Smith J. Natural selection and the concept of a protein space. Nature 1970;225:563–4.10.1038/225563a0Search in Google Scholar PubMed

78. Romero PA, Arnold FH. Exploring protein fitness landscapes by directed evolution. Nat Rev Mol Cell Biol 2009;10:866–76.10.1038/nrm2805Search in Google Scholar PubMed PubMed Central

Received: 2014-8-11
Accepted: 2014-10-15
Published Online: 2014-11-27
Published in Print: 2014-12-19

©2014 by De Gruyter

Downloaded on 26.3.2023 from https://www.degruyter.com/document/doi/10.1515/bams-2014-0013/html
Scroll Up Arrow