The Protein Data Bank (PDB) undergoes an exponential expansion in terms of the number of macromolecular structures deposited every year. A pivotal question is how this rapid growth of structural information improves the quality of three-dimensional models constructed by contemporary bioinformatics approaches. To address this problem, we performed a retrospective analysis of the structural coverage of a representative set of proteins using remote homology detected by COMPASS and HHpred. We show that the number of proteins whose structures can be confidently predicted increased during a 9-year period between 2005 and 2014 on account of the PDB growth alone. Nevertheless, this encouraging trend slowed down noticeably around the year 2008 and has yielded insignificant improvements ever since. At the current pace, it is unlikely that the protein structure prediction problem will be solved in the near future using existing template-based modeling techniques. Therefore, further advances in experimental structure determination, qualitatively better approaches in fold recognition, and more accurate template-free structure prediction methods are desperately needed.
Portions of this research were conducted with high-performance computational resources provided by the Louisiana State University (HPC@LSU; http://www.hpc.lsu.edu) and the Louisiana Optical Network Institute (LONI; http://www.loni.org). We thank Dr. Wei Feinstein who read the manuscript and provided critical comments.
Author contributions: All the authors have accepted responsibility for the entire content of this submitted manuscript and approved submission.
Research funding: This work was supported by the Louisiana Board of Regents through the Board of Regents Support Fund [contract LEQSF(2012-15)-RD-A-05].
Employment or leadership: None declared.
Honorarium: None declared.
Competing interests: The funding organization(s) played no role in the study design; in the collection, analysis, and interpretation of data; in the writing of the report; or in the decision to submit the report for publication.
1. Pauling L. Modern structural chemistry. Nobel Lecture: December 11, 1954.Search in Google Scholar
2. Pruitt KD, Brown GR, Hiatt SM, Thibaud-Nissen F, Astashyn A, Ermolaeva O, et al. RefSeq: an update on mammalian reference sequences. Nucleic Acids Res 2014;42:D756–763.10.1093/nar/gkt1114Search in Google Scholar PubMed PubMed Central
5. Dorn M, E Silva MB, Buriol LS, Lamb LC. Three-dimensional protein structure prediction: methods and computational strategies. Comput Biol Chem 2014;53PB:251–76.10.1016/j.compbiolchem.2014.10.001Search in Google Scholar PubMed
10. Ben-David M, Noivirt-Brik O, Paz A, Prilusky J, Sussman JL, Levy Y. Assessment of CASP8 structure predictions for template free targets. Proteins 2009;77:Suppl 9:50–65.10.1002/prot.22591Search in Google Scholar PubMed
11. Kinch L, Yong Shi S, Cong Q, Cheng H, Liao Y, Grishin NV. CASP9 assessment of free modeling target predictions. Proteins 2011;79:Suppl 10:59–73.10.1002/prot.23181Search in Google Scholar PubMed PubMed Central
13. Cozzetto D, Kryshtafovych A, Fidelis K, Moult J, Rost B, Tramontano A. Evaluation of template-based models in CASP8 with standard measures. Proteins 2009;77:Suppl 9:18–28.10.1002/prot.22561Search in Google Scholar
20. Boratyn GM, Schaffer AA, Agarwala R, Altschul SF, Lipman DJ, Madden TL. Domain enhanced lookup time accelerated BLAST. Biol Direct 2012;7:12.10.1186/1745-6150-7-12Search in Google Scholar PubMed PubMed Central
26. Koonin EV, Wolf YI, Aravind L. Protein fold recognition using sequence profiles and its application in structural genomics. Adv Protein Chem 2000;54:245–75.10.1016/S0065-3233(00)54008-XSearch in Google Scholar
27. Bennett-Lovsey RM, Herbert AD, Sternberg MJ, Kelley LA. Exploring the extremes of sequence/structure space with ensemble fold recognition in the program Phyre. Proteins 2008;70:611–25.10.1002/prot.21688Search in Google Scholar PubMed
29. Wu S, Zhang Y. MUSTER: improving protein sequence profile-profile alignments by using multiple sources of structure information. Proteins 2008;72:547–56.10.1002/prot.21945Search in Google Scholar PubMed PubMed Central
31. Yang Y, Faraggi E, Zhao H, Zhou Y. Improving protein fold recognition and template-based modeling by employing probabilistic-based matching between predicted one-dimensional structural properties of query and corresponding native properties of templates. Bioinformatics 2011;27:2076–2082.10.1093/bioinformatics/btr350Search in Google Scholar PubMed PubMed Central
32. Brylinski M, Lingam D. eThread: a highly optimized machine learning-based approach to meta-threading and the modeling of protein tertiary structures. PLoS One 2012;7:e50200.10.1371/journal.pone.0050200Search in Google Scholar
35. Liu T, Tang GW, Capriotti E. Comparative modeling: the state of the art and protein drug target structure prediction. Comb Chem High Throughput Screen 2011;14:532–47.10.2174/138620711795767811Search in Google Scholar PubMed
36. Takeda-Shitaka M, Takaya D, Chiba C, Tanaka H, Umeyama H. Protein structure prediction in structure based drug design. Curr Med Chem 2004;11:551–8.10.2174/0929867043455837Search in Google Scholar PubMed
39. Brylinski M. eMatchSite: sequence order-independent structure alignments of ligand binding pockets in protein models. PLoS Comput Biol 2014;10:e1003829.10.1371/journal.pcbi.1003829Search in Google Scholar PubMed PubMed Central
40. Skolnick J, Zhou H, Brylinski M. Further evidence for the likely completeness of the library of solved single domain protein structures. J Phys Chem B 2012;116:6654–64.10.1021/jp211052jSearch in Google Scholar PubMed PubMed Central
41. Zhang Y, Hubner IA, Arakaki AK, Shakhnovich E, Skolnick J. On the origin and highly likely completeness of single-domain protein structures. Proc Natl Acad Sci USA 2006;103:2605–10.10.1073/pnas.0509379103Search in Google Scholar PubMed PubMed Central
42. Zhang Y, Skolnick J. The protein structure prediction problem could be solved using the current PDB library. Proc Natl Acad Sci USA 2005;102:1029–34.10.1073/pnas.0407152101Search in Google Scholar
43. O’Donovan C, Martin MJ, Gattiker A, Gasteiger E, Bairoch A, Apweiler R. High-quality protein knowledge resource: SWISS-PROT and TrEMBL. Brief Bioinform 2002;3:275–84.10.1093/bib/3.3.275Search in Google Scholar
46. Grabowski M, Joachimiak A, Otwinowski Z, Minor W. Structural genomics: keeping up with expanding knowledge of the protein universe. Curr Opin Struct Biol 2007;17:347–53.10.1016/j.sbi.2007.06.003Search in Google Scholar
47. Sadreyev R, Grishin N. COMPASS: a tool for comparison of multiple protein alignments with assessment of statistical significance. J Mol Biol 2003;326:317–36.10.1016/S0022-2836(02)01371-2Search in Google Scholar
49. Berman HM, Battistuz T, Bhat TN, Bluhm WF, Bourne PE, Burkhardt K, et al. The Protein Data Bank. Acta Crystallogr D Biol Crystallogr 2002;58:899–907.10.1107/S0907444902003451Search in Google Scholar
52. Li W, Godzik A. CD-HIT: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 2006;22:1658–9.10.1093/bioinformatics/btl158Search in Google Scholar PubMed
53. Pandit SB, Skolnick J. Fr-TM-align: a new protein structural alignment method based on fragment alignments and the TM-score. BMC Bioinformatics 2008;9:531.10.1186/1471-2105-9-531Search in Google Scholar PubMed PubMed Central
55. Cormen TH, Leiserson CE, Rivest RL, Stein C. Greedy algorithms. Introduction to algorithms. MIT Press, 1990:414.Search in Google Scholar
©2015 by De Gruyter