Abstract
The prospect of identifying contacts in protein structures purely from aligned protein sequences has lured researchers for a long time, but progress has been modest until recently. Here, we reviewed the most successful methods for identifying structural contacts from sequence and how these methods differ and made an initial assessment of the overlap of predicted contacts by alternative approaches. We then discussed the limitations of these methods and possibilities for future development and highlighted the recent applications of contacts in tertiary structure prediction, identifying the residues at the interfaces of protein-protein interactions, and the use of these methods in disentangling alternative conformational states. Finally, we identified the current challenges in the field of contact prediction, concentrating on the limitations imposed by available data, dependencies on the sequence alignments, and possible future developments.
Acknowledgments
The authors thank Domenico Cozzetto for useful discussions. ST and TK were supported by the Wellcome Trust (studentship numbers 096622/Z/11/Z and 096624/Z/11/Z, respectively).
Author contributions: All the authors have accepted responsibility for the entire content of this submitted manuscript and approved submission.
Research funding: None declared.
Employment or leadership: None declared.
Honorarium: None declared.
Competing interests: The funding organization(s) played no role in the study design; in the collection, analysis, and interpretation of data; in the writing of the report; or in the decision to submit the report for publication.
References
1. Anfinsen CB. Principles that govern the folding of protein chains. Science 1973;181:223–30.10.1126/science.181.4096.223Search in Google Scholar
2. Gromiha MM, Selvaraj S. Inter-residue interactions in protein folding and stability. Prog Biophys Mol Biol 2004;86:235–77.10.1016/j.pbiomolbio.2003.09.003Search in Google Scholar
3. Chothia C, Lesk AM. The relation between the divergence of sequence and structure in proteins. EMBO J 1986;5:823.10.1002/j.1460-2075.1986.tb04288.xSearch in Google Scholar
4. Rost B. Twilight zone of protein sequence alignments. Protein Eng 1999;12:85–94.10.1093/protein/12.2.85Search in Google Scholar
5. Vendruscolo M, Paci E, Dobson CM, Karplus M. Three key residues form a critical contact network in a protein folding transition state. Nature 2001;409:641–5.10.1038/35054591Search in Google Scholar
6. Williams SG, Lovell SC. The effect of sequence evolution on protein structural divergence. Mol Biol Evol 2009;26:1055–65.10.1093/molbev/msp020Search in Google Scholar
7. Poon A, Chao L. The rate of compensatory mutation in the DNA bacteriophage φX174. Genetics 2005;170:989–99.10.1534/genetics.104.039438Search in Google Scholar
8. Goh C-S, Bogan AA, Joachimiak M, Walther D, Cohen FE. Co-evolution of proteins with their interaction partners. J Mol Biol 2000;299:283–93.10.1006/jmbi.2000.3732Search in Google Scholar
9. Altschuh D, Lesk A, Bloomer A, Klug A. Correlation of co-ordinated amino acid substitutions with function in viruses related to tobacco mosaic virus. J Mol Biol 1987;193:693–707.10.1016/0022-2836(87)90352-4Search in Google Scholar
10. Göbel U, Sander C, Schneider R, Valencia A. Correlated mutations and residue contacts in proteins. Prot Struct Funct Bioinf 1994;18:309–17.10.1002/prot.340180402Search in Google Scholar PubMed
11. Vernet T, Tessier DC, Khouri HE, Altschuh D. Correlation of co-ordinated amino acid changes at the two-domain interface of cysteine proteases with protein stability. J Mol Biol 1992;224:501–9.10.1016/0022-2836(92)91011-DSearch in Google Scholar
12. Neher E. How frequent are correlated changes in families of protein sequences? Proc Natl Acad Sci 1994;91:98–102.10.1073/pnas.91.1.98Search in Google Scholar PubMed PubMed Central
13. Jones DT, Buchan DW, Cozzetto D, Pontil M. PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments. Bioinformatics 2012;28:184–90.10.1093/bioinformatics/btr638Search in Google Scholar PubMed
14. Lapedes AS, Giraud BG, Liu L, Stormo GD. Correlated mutations in models of protein sequences: phylogenetic and structural effects. Lecture Notes Monogr Ser 1999:236–56.10.1214/lnms/1215455556Search in Google Scholar
15. Pollock D, Taylor W. Effectiveness of correlation analysis in identifying protein residues undergoing correlated evolution. Protein Eng 1997;10:647–57.10.1093/protein/10.6.647Search in Google Scholar PubMed
16. Dunn SD, Wahl LM, Gloor GB. Mutual information without the influence of phylogeny or entropy dramatically improves residue contact prediction. Bioinformatics 2008;24:333–40.10.1093/bioinformatics/btm604Search in Google Scholar PubMed
17. Little DY, Chen L. Identification of coevolving residues and coevolution potentials emphasizing structure, bond formation and catalytic coordination in protein evolution. PLoS One 2009;4:e4762.10.1371/journal.pone.0004762Search in Google Scholar PubMed PubMed Central
18. Gloor GB, Tyagi G, Abrassart DM, Kingston AJ, Fernandes AD, Dunn SD, et al. Functionally compensating coevolving positions are neither homoplasic nor conserved in clades. Mol Biol Evol 2010;27:1181–91.10.1093/molbev/msq004Search in Google Scholar PubMed
19. Giraud B, Heumann JM, Lapedes AS. Superadditive correlation. Phys Rev E 1999;59:4983.10.1103/PhysRevE.59.4983Search in Google Scholar
20. de Juan D, Pazos F, Valencia A. Emerging methods in protein co-evolution. Nat Rev Genet 2013;14:249–61.10.1038/nrg3414Search in Google Scholar PubMed
21. Taylor WR, Hamilton RS, Sadowski MI. Prediction of contacts from correlated sequence substitutions. Curr Opin Struct Biol 2013;23:473–9.10.1016/j.sbi.2013.04.001Search in Google Scholar
22. Korber B, Farber RM, Wolpert DH, Lapedes AS. Covariation of mutations in the V3 loop of human immunodeficiency virus type 1 envelope protein: an information theoretic analysis. Proc Natl Acad Sci 1993;90:7176–80.10.1073/pnas.90.15.7176Search in Google Scholar
23. Shindyalov I, Kolchanov N, Sander C. Can three-dimensional contacts in protein structures be predicted by analysis of correlated mutations? Protein Eng 1994;7:349–58.10.1093/protein/7.3.349Search in Google Scholar
24. Taylor WR, Hatrick K. Compensating changes in protein multiple sequence alignments. Protein Eng 1994;7:341–8.10.1093/protein/7.3.341Search in Google Scholar
25. Benner SA, Gerloff D. Patterns of divergence in homologous proteins as indicators of secondary and tertiary structure: a prediction of the structure of the catalytic domain of protein kinases. Adv Enzyme Regul 1991;31:121–81.10.1016/0065-2571(91)90012-BSearch in Google Scholar
26. Weigt M, White RA, Szurmant H, Hoch JA, Hwa T. Identification of direct residue contacts in protein-protein interaction by message passing. Proc Natl Acad Sci 2009;106:67–72.10.1073/pnas.0805923106Search in Google Scholar PubMed PubMed Central
27. Marks DS, Colwell LJ, Sheridan R, Hopf TA, Pagnani A, Zecchina R, et al. Protein 3D structure computed from evolutionary sequence variation. PLoS One 2011;6:e28766.10.1371/journal.pone.0028766Search in Google Scholar PubMed PubMed Central
28. Morcos F, Pagnani A, Lunt B, Bertolino A, Marks DS, Sander C, et al. Direct-coupling analysis of residue coevolution captures native contacts across many protein families. Proc Natl Acad Sci 2011;108:E1293–301.10.1073/pnas.1111471108Search in Google Scholar PubMed PubMed Central
29. Feinauer C, Skwark MJ, Pagnani A, Aurell E. Improving contact prediction along three dimensions. PLoS Comput Biol 2014;10: e1003847.10.1371/journal.pcbi.1003847Search in Google Scholar PubMed PubMed Central
30. Ekeberg M, Hartonen T, Aurell E. Fast pseudolikelihood maximization for direct-coupling analysis of protein structure from many homologous amino-acid sequences. J Comput Phys 2014;276:341–56.10.1016/j.jcp.2014.07.024Search in Google Scholar
31. Ekeberg M, Lövkvist C, Lan Y, Weigt M, Aurell E. Improved contact prediction in proteins: using pseudolikelihoods to infer Potts models. Phys Rev E 2013;87:012707.10.1103/PhysRevE.87.012707Search in Google Scholar PubMed
32. Kamisetty H, Ovchinnikov S, Baker D. Assessing the utility of coevolution-based residue-residue contact predictions in a sequence- and structure-rich era. Proc Natl Acad Sci 2013;110:15674–9.10.1073/pnas.1314045110Search in Google Scholar PubMed PubMed Central
33. Andreatta M, Laplagne S, Li SC, Smale S. Prediction of residue-residue contacts from protein families using similarity kernels and least squares regularization. arXiv preprint arXiv:13111301. 2014.Search in Google Scholar
34. Clark GW, Ackerman SH, Tillier ER, Gatti DL. Multidimensional mutual information methods for the analysis of covariation in multiple sequence alignments. BMC Bioinformatics 2014;15:157.10.1186/1471-2105-15-157Search in Google Scholar PubMed PubMed Central
35. Baldassi C, Zamparo M, Feinauer C, Procaccini A, Zecchina R, Weigt M, et al. Fast and accurate multivariate Gaussian modeling of protein families: predicting residue contacts and protein-interaction partners. PLoS One 2014;9:e92721.10.1371/journal.pone.0092721Search in Google Scholar PubMed PubMed Central
36. Friedman J, Hastie T, Tibshirani R. Sparse inverse covariance estimation with the graphical LASSO. Biostatistics 2008;9:432–41.10.1093/biostatistics/kxm045Search in Google Scholar PubMed PubMed Central
37. Banerjee O, El Ghaoui L, d’Aspremont A. Model selection through sparse maximum likelihood estimation for multivariate Gaussian or binary data. J Mach Learn Res 2008;9:485–516.Search in Google Scholar
38. Kaján L, Hopf TA, Marks DS, Rost B. FreeContact: fast and free software for protein contact prediction from residue co-evolution. BMC Bioinformatics 2014;15:85.10.1186/1471-2105-15-85Search in Google Scholar PubMed PubMed Central
39. Skwark MJ, Abdel-Rehim A, Elofsson A. PconsC: combination of direct information methods and alignments improves contact prediction. Bioinformatics 2013;29:1815–6.10.1093/bioinformatics/btt259Search in Google Scholar PubMed
40. Hopf TA, Colwell LJ, Sheridan R, Rost B, Sander C, Marks DS. Three-dimensional structures of membrane proteins from genomic sequencing. Cell 2012;149:1607–21.10.1016/j.cell.2012.04.012Search in Google Scholar PubMed PubMed Central
41. Sułkowska JI, Morcos F, Weigt M, Hwa T, Onuchic JN. Genomics-aided structure prediction. Proc Natl Acad Sci 2012;109:10340–5.10.1073/pnas.1207864109Search in Google Scholar PubMed PubMed Central
42. Kosciolek T, Jones DT. De novo structure prediction of globular proteins aided by sequence variation-derived contacts. PLoS One 2014;9:e92197.10.1371/journal.pone.0092197Search in Google Scholar PubMed PubMed Central
43. Jones DT. Predicting novel protein folds by using FRAGFOLD. Prot Struct Funct Bioinf 2001;45:127–32.10.1002/prot.1171Search in Google Scholar
44. Brunger AT, Adams PD, Clore GM, DeLano WL, Gros P, Grosse-Kunstleve RW, et al. Crystallography & NMR system: a new software suite for macromolecular structure determination. Acta Crystallogr Sect D Biol Crystallogr 1998;54:905–21.10.1107/S0907444998003254Search in Google Scholar
45. Brunger AT. Version 1.2 of the crystallography and NMR system. Nat Prot 2007;2:2728–33.10.1038/nprot.2007.406Search in Google Scholar
46. Vendruscolo M, Kussell E, Domany E. Recovery of protein structure from contact maps. Folding Des 1997;2:295–306.10.1016/S1359-0278(97)00041-2Search in Google Scholar
47. Duarte JM, Sathyapriya R, Stehr H, Filippis I, Lappe M. Optimal contact definition for reconstruction of contact maps. BMC Bioinformatics 2010;11:283.10.1186/1471-2105-11-283Search in Google Scholar PubMed PubMed Central
48. Kim DE, DiMaio F, Yu-Ruei Wang R, Song Y, Baker D. One contact for every twelve residues allows robust and accurate topology-level protein structure modeling. Prot Struct Funct Bioinf 2014;82:208–18.10.1002/prot.24374Search in Google Scholar PubMed PubMed Central
49. Konopka BM, Ciombor M, Kurczynska M, Kotulska M. Automated procedure for contact-map-based protein structure reconstruction. J Membr Biol 2014;247:409–20.10.1007/s00232-014-9648-xSearch in Google Scholar PubMed PubMed Central
50. Taylor TJ, Bai H, Tai CH, Lee B. Assessment of CASP10 contact-assisted predictions. Prot Struct Funct Bioinf 2014;82:84–97.10.1002/prot.24367Search in Google Scholar PubMed PubMed Central
51. Tress ML, Valencia A. Predicted residue-residue contacts can help the scoring of 3D models. Prot Struct Funct Bioinf 2010;78:1980–91.10.1002/prot.22714Search in Google Scholar PubMed
52. Taylor WR, Jones DT, Sadowski MI. Protein topology from predicted residue contacts. Prot Sci 2012;21:299–305.10.1002/pro.2002Search in Google Scholar PubMed PubMed Central
53. Savojardo C, Fariselli P, Martelli PL, Casadio R. BCov: a method for predicting β-sheet topology using sparse inverse covariance estimation and integer programming. Bioinformatics 2013:btt555.10.1093/bioinformatics/btt555Search in Google Scholar PubMed PubMed Central
54. Sadowski MI. Prediction of protein domain boundaries from inverse covariances. Prot Struct Funct Bioinf 2013;81:253–60.10.1002/prot.24181Search in Google Scholar PubMed PubMed Central
55. Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat T, Weissig H, et al. The protein data bank. Nucleic Acids Res 2000;28:235–42.10.1093/nar/28.1.235Search in Google Scholar PubMed PubMed Central
56. Janin J, Bahadur RP, Chakrabarti P. Protein-protein interaction and quaternary structure. Q Rev Biophys 2008;41:133–80.10.1017/S0033583508004708Search in Google Scholar PubMed
57. Hopf TA, Schärfe CP, Rodrigues JP, Green AG, Sander C, Bonvin AM, et al. Sequence co-evolution gives 3D contacts and structures of protein complexes. eLife 2014.10.1101/004762Search in Google Scholar
58. Pazos F, Helmer-Citterich M, Ausiello G, Valencia A. Correlated mutations contain information about protein-protein interaction. J Mol Biol 1997;271:511–23.10.1006/jmbi.1997.1198Search in Google Scholar PubMed
59. Shih ES, Hwang MJ. On the use of distance constraints in protein-protein docking computations. Prot Struct Funct Bioinf 2012;80:194–205.10.1002/prot.23179Search in Google Scholar PubMed
60. Stock AM, Robinson VL, Goudreau PN. Two-component signal transduction. Annu Rev Biochem 2000;69:183–215.10.1146/annurev.biochem.69.1.183Search in Google Scholar PubMed
61. Cheng RR, Morcos F, Levine H, Onuchic JN. Toward rationally redesigning bacterial two-component signaling systems using coevolutionary information. Proc Natl Acad Sci 2014;111:E563–71.10.1073/pnas.1323734111Search in Google Scholar PubMed PubMed Central
62. Ovchinnikov S, Kamisetty H, Baker D. Robust and accurate prediction of residue-residue interactions across protein interfaces using evolutionary information. eLife 2014;3.10.7554/eLife.02030Search in Google Scholar PubMed PubMed Central
63. Butland G, Peregrín-Alvarez JM, Li J, Yang W, Yang X, Canadien V, et al. Interaction network containing conserved and essential protein complexes in Escherichia coli. Nature 2005;433:531–7.10.1038/nature03239Search in Google Scholar PubMed
64. Jeon J, Nam H-J, Choi YS, Yang J-S, Hwang J, Kim S. Molecular evolution of protein conformational changes revealed by a network of evolutionarily coupled residues. Mol Biol Evol 2011;28:2675–85.10.1093/molbev/msr094Search in Google Scholar PubMed
65. Jana B, Morcos F, Onuchic JN. From structure to function: the convergence of structure based models and co-evolutionary information. Phys Chem Chem Phys 2014;16:6496–507.10.1039/C3CP55275FSearch in Google Scholar
66. Morcos F, Jana B, Hwa T, Onuchic JN. Coevolutionary signals across protein lineages help capture multiple protein conformations. Proc Natl Acad Sci 2013;110:20533–8.10.1073/pnas.1315625110Search in Google Scholar PubMed PubMed Central
67. Martin L, Gloor GB, Dunn S, Wahl LM. Using information theory to search for co-evolving residues in proteins. Bioinformatics 2005;21:4116–24.10.1093/bioinformatics/bti671Search in Google Scholar PubMed
68. Henikoff S, Henikoff JG. Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci 1992;89:10915–9.10.1073/pnas.89.22.10915Search in Google Scholar PubMed PubMed Central
69. Finn RD, Bateman A, Clements J, Coggill P, Eberhardt RY, Eddy SR, et al. Pfam: the protein families database. Nucleic Acids Res 2013:gkt1223.10.1093/nar/gkt1223Search in Google Scholar PubMed PubMed Central
70. Remmert M, Biegert A, Hauser A, Söding J. HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment. Nat Methods 2012;9:173–5.10.1038/nmeth.1818Search in Google Scholar PubMed
71. Eddy SR. Profile hidden Markov models. Bioinformatics 1998;14:755–63.10.1093/bioinformatics/14.9.755Search in Google Scholar PubMed
72. Ma J, Wang S, Wang Z, Xu J. MRFalign: protein homology detection through alignment of Markov random fields. PLoS Comput Biol 2014;10:e1003500.10.1371/journal.pcbi.1003500Search in Google Scholar PubMed PubMed Central
73. Gonzalez MW, Pearson WR. Homologous over-extension: a challenge for iterative similarity searches. Nucleic Acids Res 2010;38:2177–89.10.1093/nar/gkp1219Search in Google Scholar PubMed PubMed Central
74. Seemayer S, Gruber M, Söding J. CCMpred – fast and precise prediction of protein residue-residue contacts from correlated mutations. Bioinformatics 2014;30:3128–30.10.1093/bioinformatics/btu500Search in Google Scholar PubMed PubMed Central
75. Nugent T, Jones DT. Accurate de novo structure prediction of large transmembrane protein domains using fragment-assembly and correlated mutation analysis. Proc Natl Acad Sci 2012;109:E1540–7.10.1073/pnas.1120036109Search in Google Scholar PubMed PubMed Central
76. Michel M, Hayat S, Skwark MJ, Sander C, Marks DS, Elofsson A. PconsFold: improved contact predictions improve protein models. Bioinformatics 2014;30:i482–8.10.1093/bioinformatics/btu458Search in Google Scholar PubMed PubMed Central
77. Maynard Smith J. Natural selection and the concept of a protein space. Nature 1970;225:563–4.10.1038/225563a0Search in Google Scholar PubMed
78. Romero PA, Arnold FH. Exploring protein fitness landscapes by directed evolution. Nat Rev Mol Cell Biol 2009;10:866–76.10.1038/nrm2805Search in Google Scholar PubMed PubMed Central
©2014 by De Gruyter