Skip to content
Licensed Unlicensed Requires Authentication Published by De Gruyter February 7, 2015

Is the growth rate of Protein Data Bank sufficient to solve the protein structure prediction problem using template-based modeling?

Michal Brylinski ORCID logo


The Protein Data Bank (PDB) undergoes an exponential expansion in terms of the number of macromolecular structures deposited every year. A pivotal question is how this rapid growth of structural information improves the quality of three-dimensional models constructed by contemporary bioinformatics approaches. To address this problem, we performed a retrospective analysis of the structural coverage of a representative set of proteins using remote homology detected by COMPASS and HHpred. We show that the number of proteins whose structures can be confidently predicted increased during a 9-year period between 2005 and 2014 on account of the PDB growth alone. Nevertheless, this encouraging trend slowed down noticeably around the year 2008 and has yielded insignificant improvements ever since. At the current pace, it is unlikely that the protein structure prediction problem will be solved in the near future using existing template-based modeling techniques. Therefore, further advances in experimental structure determination, qualitatively better approaches in fold recognition, and more accurate template-free structure prediction methods are desperately needed.

Corresponding author: Michal Brylinski, Department of Biological Sciences, 202 Life Sciences Bldg., Louisiana State University, Baton Rouge, LA 70803, USA; and Center for Computation and Technology, 2054 Digital Media Center, Louisiana State University, Baton Rouge, LA 70803, USA, E-mail: .


Portions of this research were conducted with high-performance computational resources provided by the Louisiana State University (HPC@LSU; and the Louisiana Optical Network Institute (LONI; We thank Dr. Wei Feinstein who read the manuscript and provided critical comments.

  1. Author contributions: All the authors have accepted responsibility for the entire content of this submitted manuscript and approved submission.

  2. Research funding: This work was supported by the Louisiana Board of Regents through the Board of Regents Support Fund [contract LEQSF(2012-15)-RD-A-05].

  3. Employment or leadership: None declared.

  4. Honorarium: None declared.

  5. Competing interests: The funding organization(s) played no role in the study design; in the collection, analysis, and interpretation of data; in the writing of the report; or in the decision to submit the report for publication.


1. Pauling L. Modern structural chemistry. Nobel Lecture: December 11, 1954.Search in Google Scholar

2. Pruitt KD, Brown GR, Hiatt SM, Thibaud-Nissen F, Astashyn A, Ermolaeva O, et al. RefSeq: an update on mammalian reference sequences. Nucleic Acids Res 2014;42:D756–763.10.1093/nar/gkt1114Search in Google Scholar PubMed PubMed Central

3. Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, et al. The Protein Data Bank. Nucleic Acids Res 2000;28:235–42.10.1093/nar/28.1.235Search in Google Scholar PubMed PubMed Central

4. Guo JT, Ellrott K, Xu Y. A historical perspective of template-based protein structure prediction. Methods Mol Biol 2008;413:3–42.10.1007/978-1-59745-574-9_1Search in Google Scholar PubMed

5. Dorn M, E Silva MB, Buriol LS, Lamb LC. Three-dimensional protein structure prediction: methods and computational strategies. Comput Biol Chem 2014;53PB:251–76.10.1016/j.compbiolchem.2014.10.001Search in Google Scholar PubMed

6. Honig B. Protein folding: from the levinthal paradox to structure prediction. J Mol Biol 1999;293:283–93.10.1006/jmbi.1999.3006Search in Google Scholar PubMed

7. Onuchic JN, Wolynes PG. Theory of protein folding. Curr Opin Struct Biol 2004;14:70–5.10.1016/ in Google Scholar PubMed

8. Zhang J, Li W, Wang J, Qin M, Wu L, Yan Z, et al. Protein folding simulations: from coarse-grained model to all-atom model. IUBMB Life 2009;61:627–43.10.1002/iub.223Search in Google Scholar PubMed

9. Kryshtafovych A, Fidelis K, Moult J. CASP10 results compared to those of previous CASP experiments. Proteins 2014;82:Suppl 2:164–74.10.1002/prot.24448Search in Google Scholar PubMed PubMed Central

10. Ben-David M, Noivirt-Brik O, Paz A, Prilusky J, Sussman JL, Levy Y. Assessment of CASP8 structure predictions for template free targets. Proteins 2009;77:Suppl 9:50–65.10.1002/prot.22591Search in Google Scholar PubMed

11. Kinch L, Yong Shi S, Cong Q, Cheng H, Liao Y, Grishin NV. CASP9 assessment of free modeling target predictions. Proteins 2011;79:Suppl 10:59–73.10.1002/prot.23181Search in Google Scholar PubMed PubMed Central

12. Tai CH, Bai H, Taylor TJ, Lee B. Assessment of template-free modeling in CASP10 and ROLL. Proteins 2014;82:Suppl 2:57–83.10.1002/prot.24470Search in Google Scholar

13. Cozzetto D, Kryshtafovych A, Fidelis K, Moult J, Rost B, Tramontano A. Evaluation of template-based models in CASP8 with standard measures. Proteins 2009;77:Suppl 9:18–28.10.1002/prot.22561Search in Google Scholar

14. Huang YJ, Mao B, Aramini JM, Montelione GT. Assessment of template-based protein structure predictions in CASP10. Proteins 2014;82:Suppl 2:43–56.10.1002/prot.24488Search in Google Scholar

15. Mariani V, Kiefer F, Schmidt T, Haas J, Schwede T. Assessment of template based protein structure predictions in CASP9. Proteins 2011;79:Suppl 10:37–58.10.1002/prot.23177Search in Google Scholar

16. Ginalski K. Comparative modeling for protein structure prediction. Curr Opin Struct Biol 2006;16:172–7.10.1016/ in Google Scholar

17. Lushington GH. Comparative modeling of proteins. Methods Mol Biol 2015;1215:309–30.10.1007/978-1-4939-1465-4_14Search in Google Scholar

18. Qu X, Swanson R, Day R, Tsai J. A guide to template based structure prediction. Curr Protein Pept Sci 2009;10:270–85.10.2174/138920309788452182Search in Google Scholar

19. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol 1990;215:403–10.10.1016/S0022-2836(05)80360-2Search in Google Scholar

20. Boratyn GM, Schaffer AA, Agarwala R, Altschul SF, Lipman DJ, Madden TL. Domain enhanced lookup time accelerated BLAST. Biol Direct 2012;7:12.10.1186/1745-6150-7-12Search in Google Scholar PubMed PubMed Central

21. Biegert A, Soding J. Sequence context-specific profiles for homology searching. Proc Natl Acad Sci USA 2009;106:3770–5.10.1073/pnas.0810767106Search in Google Scholar PubMed PubMed Central

22. Pearson WR, Lipman DJ. Improved tools for biological sequence comparison. Proc Natl Acad Sci USA 1988;85:2444–8.10.1073/pnas.85.8.2444Search in Google Scholar

23. Rost B. Twilight zone of protein sequence alignments. Protein Eng 1999;12:85–94.10.1093/protein/12.2.85Search in Google Scholar

24. Jones DT, Taylor WR, Thornton JM. A new approach to protein fold recognition. Nature 1992;358:86–9.10.1038/358086a0Search in Google Scholar

25. Joseph AP, de Brevern AG. From local structure to a global framework: recognition of protein folds. J R Soc Interface 2014;11:20131147.10.1098/rsif.2013.1147Search in Google Scholar

26. Koonin EV, Wolf YI, Aravind L. Protein fold recognition using sequence profiles and its application in structural genomics. Adv Protein Chem 2000;54:245–75.10.1016/S0065-3233(00)54008-XSearch in Google Scholar

27. Bennett-Lovsey RM, Herbert AD, Sternberg MJ, Kelley LA. Exploring the extremes of sequence/structure space with ensemble fold recognition in the program Phyre. Proteins 2008;70:611–25.10.1002/prot.21688Search in Google Scholar PubMed

28. Peng J, Xu J. Low-homology protein threading. Bioinformatics 2010;26:i294–300.10.1093/bioinformatics/btq192Search in Google Scholar PubMed PubMed Central

29. Wu S, Zhang Y. MUSTER: improving protein sequence profile-profile alignments by using multiple sources of structure information. Proteins 2008;72:547–56.10.1002/prot.21945Search in Google Scholar PubMed PubMed Central

30. Xu J, Li M, Kim D, Xu Y. RAPTOR: optimal protein threading by linear programming. J Bioinform Comput Biol 2003;1:95–117.10.1142/S0219720003000186Search in Google Scholar

31. Yang Y, Faraggi E, Zhao H, Zhou Y. Improving protein fold recognition and template-based modeling by employing probabilistic-based matching between predicted one-dimensional structural properties of query and corresponding native properties of templates. Bioinformatics 2011;27:2076–2082.10.1093/bioinformatics/btr350Search in Google Scholar PubMed PubMed Central

32. Brylinski M, Lingam D. eThread: a highly optimized machine learning-based approach to meta-threading and the modeling of protein tertiary structures. PLoS One 2012;7:e50200.10.1371/journal.pone.0050200Search in Google Scholar

33. Wu S, Zhang Y. LOMETS: a local meta-threading-server for protein structure prediction. Nucleic Acids Res 2007;35:3375–82.10.1093/nar/gkm251Search in Google Scholar

34. Hillisch A, Pineda LF, Hilgenfeld R. Utility of homology models in the drug discovery process. Drug Discov Today 2004;9:659–69.10.1016/S1359-6446(04)03196-4Search in Google Scholar

35. Liu T, Tang GW, Capriotti E. Comparative modeling: the state of the art and protein drug target structure prediction. Comb Chem High Throughput Screen 2011;14:532–47.10.2174/138620711795767811Search in Google Scholar PubMed

36. Takeda-Shitaka M, Takaya D, Chiba C, Tanaka H, Umeyama H. Protein structure prediction in structure based drug design. Curr Med Chem 2004;11:551–8.10.2174/0929867043455837Search in Google Scholar PubMed

37. Zhang Y. Protein structure prediction: when is it useful? Curr Opin Struct Biol 2009;19:145–55.10.1016/ in Google Scholar PubMed PubMed Central

38. Brylinski M. Nonlinear scoring functions for similarity-based ligand docking and binding affinity prediction. J Chem Inf Model 2013;53:3097–112.10.1021/ci400510eSearch in Google Scholar PubMed

39. Brylinski M. eMatchSite: sequence order-independent structure alignments of ligand binding pockets in protein models. PLoS Comput Biol 2014;10:e1003829.10.1371/journal.pcbi.1003829Search in Google Scholar PubMed PubMed Central

40. Skolnick J, Zhou H, Brylinski M. Further evidence for the likely completeness of the library of solved single domain protein structures. J Phys Chem B 2012;116:6654–64.10.1021/jp211052jSearch in Google Scholar PubMed PubMed Central

41. Zhang Y, Hubner IA, Arakaki AK, Shakhnovich E, Skolnick J. On the origin and highly likely completeness of single-domain protein structures. Proc Natl Acad Sci USA 2006;103:2605–10.10.1073/pnas.0509379103Search in Google Scholar PubMed PubMed Central

42. Zhang Y, Skolnick J. The protein structure prediction problem could be solved using the current PDB library. Proc Natl Acad Sci USA 2005;102:1029–34.10.1073/pnas.0407152101Search in Google Scholar

43. O’Donovan C, Martin MJ, Gattiker A, Gasteiger E, Bairoch A, Apweiler R. High-quality protein knowledge resource: SWISS-PROT and TrEMBL. Brief Bioinform 2002;3:275–84.10.1093/bib/3.3.275Search in Google Scholar

44. Vitkup D, Melamud E, Moult J, Sander C. Completeness in structural genomics. Nat Struct Biol 2001;8:559–66.10.1038/88640Search in Google Scholar

45. Yan Y, Moult J. Protein family clustering for structural genomics. J Mol Biol 2005;353:744–59.10.1016/j.jmb.2005.08.058Search in Google Scholar

46. Grabowski M, Joachimiak A, Otwinowski Z, Minor W. Structural genomics: keeping up with expanding knowledge of the protein universe. Curr Opin Struct Biol 2007;17:347–53.10.1016/ in Google Scholar

47. Sadreyev R, Grishin N. COMPASS: a tool for comparison of multiple protein alignments with assessment of statistical significance. J Mol Biol 2003;326:317–36.10.1016/S0022-2836(02)01371-2Search in Google Scholar

48. Soding J. Protein homology detection by HMM-HMM comparison. Bioinformatics 2005;21:951–60.10.1093/bioinformatics/bti125Search in Google Scholar PubMed

49. Berman HM, Battistuz T, Bhat TN, Bluhm WF, Bourne PE, Burkhardt K, et al. The Protein Data Bank. Acta Crystallogr D Biol Crystallogr 2002;58:899–907.10.1107/S0907444902003451Search in Google Scholar

50. Berman HM, Kleywegt GJ, Nakamura H, Markley JL. How community has shaped the Protein Data Bank. Structure 2013;21:1485–91.10.1016/j.str.2013.07.010Search in Google Scholar PubMed PubMed Central

51. Campbell ID. Timeline: the march of structural biology. Nat Rev Mol Cell Biol 2002;3:377–81.10.1038/nrm800Search in Google Scholar PubMed

52. Li W, Godzik A. CD-HIT: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 2006;22:1658–9.10.1093/bioinformatics/btl158Search in Google Scholar PubMed

53. Pandit SB, Skolnick J. Fr-TM-align: a new protein structural alignment method based on fragment alignments and the TM-score. BMC Bioinformatics 2008;9:531.10.1186/1471-2105-9-531Search in Google Scholar PubMed PubMed Central

54. Zhang Y, Skolnick J. Scoring function for automated assessment of protein structure template quality. Proteins 2004;57:702–10.10.1002/prot.20264Search in Google Scholar PubMed

55. Cormen TH, Leiserson CE, Rivest RL, Stein C. Greedy algorithms. Introduction to algorithms. MIT Press, 1990:414.Search in Google Scholar

56. Xu J, Zhang Y. How significant is a protein structure similarity with TM-score=0.5? Bioinformatics 2010;26:889–95.10.1093/bioinformatics/btq066Search in Google Scholar PubMed PubMed Central

Received: 2014-12-22
Accepted: 2015-1-8
Published Online: 2015-2-7
Published in Print: 2015-3-31

©2015 by De Gruyter