Classifying ordered-disordered proteins using linear and kernel support vector machines

Çağın Kandemir Çavaş 1  and Selen Yildirim 2
  • 1 Department of Computer Science, Dokuz Eylül University, Izmir, Turkey
  • 2 The Graduate School of Natural and Applied Sciences, Department of Statistics, Dokuz Eylül University, Izmir, Turkey
Çağın Kandemir Çavaş and Selen Yildirim

Abstract

Introduction:

Intrinsically disordered proteins occur when the deformations happen in the tertiary structure of a protein. Disordered proteins play an important role in DNA/RNA/protein recognition, modulation of specificity/affinity of protein binding, molecular threading, activation by cleavage. The aim of the study is the identification of ordered-disordered protein which is a very challenging problem in bioinformatics.

Methods:

In this paper, this kind of proteins is classified by using linear and kernel (nonlinear) support vector machines (SVM).

Results:

Overall accuracy rate of linear SVM and kernel SVM in identifying the ordered-disordered proteins are 86.54% and 94.23%, respectively.

Discussion and conclusion:

Since kernel SVM gives the best discriminating scheme, it can be referred that it is a very satisfying method to identify ordered-disordered structures of proteins.

  • 1.

    Lesk AM. Introduction to bioinformatics. NewYork: Oxford University Press, 2005.

  • 2.

    Tompa P. Intrinsically unstructured proteins. Trends Biochem Sci 2002;27:527–33.

    • Crossref
    • PubMed
    • Export Citation
  • 3.

    Khan SH, Kumar R. An overview of the importance of conformational flexibility in gene regulation by the transcription factors. J Biophy 2009;2009:1–9.

    • Crossref
    • Export Citation
  • 4.

    Sandhu KS. Intrinsic disorder explains diverse nuclear roles of chromatin remodeling proteins. J Mol Recognit 2009;22:1–8.

    • Crossref
    • PubMed
    • Export Citation
  • 5.

    Iakoucheva LM, Brown CJ, Lawson JD, Obradović Z, Dunker AK. Intrinsic disorder in cell-signaling and cancer-associated proteins. J Mol Biol 2002;323:573–84.

    • Crossref
    • PubMed
    • Export Citation
  • 6.

    Li J, Feng Y, Wang X, Li J, Liu W, Rong L, Bao J. An overview of predictors for intrinsically disordered proteins over 2010–2014. Int J Mol Sci 2015;16:23446–62.

    • Crossref
    • PubMed
    • Export Citation
  • 7.

    Uversky VN. Intrinsically disordered proteins and their (disordered) proteomes in neurodegenerative disorders. Front Aging Neurosci 2015;7:1–6.

  • 8.

    Snyder DA, Chen Y, Denissova NG, Acton T, Aramini JM, Ciano M, et al. Comparisons of NMR spectral quality and success in crystallization demonstrate that NMR and X-ray crystallography are complementary methods for small protein structure determination. J Am Chem Soc 2005;127:16505–11.

    • Crossref
    • PubMed
    • Export Citation
  • 9.

    Chen K, Kurgan LA, Ruan J. Prediction of flexible/rigid regions from protein sequences using k-spaced amino acid pairs. BMC Struct Biol 2007;7–25:1–13.

  • 10.

    Wang L, Sauer UH. OnD-CRF: predicting order and disorder in proteins conditional random fields. Bioinformatics 2008;24:1401–2.

    • Crossref
    • PubMed
    • Export Citation
  • 11.

    Yang ZR, Thomson R, McNeil P, Esnouf RM. RONN: the bio-basis function neural network technique applied to the detection of natively disordered regions in proteins. Bioinformatics 2005;21:3369–76.

    • Crossref
    • PubMed
    • Export Citation
  • 12.

    Dosztanyi Z, Csizmok V, Tompa P, Simon I. The pairwise energy content estimated from amino acid composition discriminates between folded and intrinsically unstructured proteins. J Mol Biol 2005;347:827–39.

    • Crossref
    • PubMed
    • Export Citation
  • 13.

    Wei Z, He J, Harrison R, Tai P, Pan Y. Clustering support vector machines for protein local structure prediction. Expert Syst Appl 2007;32:518–26.

    • Crossref
    • Export Citation
  • 14.

    Zhang G, Ge H. Support vector machine with a Pearson VII function kernel for discriminating halophilic and non-halophilic proteins. Comput Biol Chem 2013;46:16–22.

    • Crossref
    • PubMed
    • Export Citation
  • 15.

    Chen C, Tian Y, Zou X, Cai P, Mo J. Using pseudo-amino acid composition and support vector machine to predict protein structural class. J Theor Biol 2006;243:444–8.

    • Crossref
    • PubMed
    • Export Citation
  • 16.

    Pugalenthi G, Kumar KK, Suganthan PN, Gangal R. Identification of catalytic residues from protein structure using support vector machine with sequence and structural features. Biochem Biophys Res Commun 2008;367:630–4.

    • Crossref
    • PubMed
    • Export Citation
  • 17.

    Cai CZ, Wang WL, Sun LZ, Chen YZ. Protein function classification via support vector machine approach. Math Biosci 2003;185:111–22.

    • Crossref
    • PubMed
    • Export Citation
  • 18.

    Cai YD, Liu XJ, Xu X, Chou KC. Prediction of protein structural classes by support vector machines. Comput Chem 2002;26:293–6.

    • Crossref
    • PubMed
    • Export Citation
  • 19.

    Saruta K, Hirai Y, Tonaka K, Inove E, Okayasu T, Mitsuoka M. Predictive models for yield and protein content of brown rice using support vector machine. Comput Electron Agric2013;99:93–100.

    • Crossref
    • Export Citation
  • 20.

    Lorena AC, de Carvalho AC. Protein cellular localization prediction with support vector machines and decision trees. Comput Biol Med 2007;37:115–25.

    • Crossref
    • PubMed
    • Export Citation
  • 21.

    Hua S, Sun Z. Support vector machine approach for protein subcellular localization prediction. Bioinformatics 2001;17:721–8.

    • Crossref
    • PubMed
    • Export Citation
  • 22.

    Chen C, Tian Y, Zou X, Cai P, Mo J. Prediction of protein secondary structure content using support vector machine. Talanta 2007;71:2069–73.

    • Crossref
    • PubMed
    • Export Citation
  • 23.

    Güraksına GE, Haklı H, Uguz H. Support vector machines classification based on particle swarm optimization for bone age determination. Appl Soft Comput 2014;24:597–602.

    • Crossref
    • Export Citation
  • 24.

    Guyon I, Weston J, Barnhill S. Gene selection for cancer classification using support vector machines. Mach Learn 2002;46:389–422.

    • Crossref
    • Export Citation
  • 25.

    Cortes C, Vapnik V. Support-vector networks. Mach Learn 1995;20:273–97.

    • Crossref
    • Export Citation
  • 26.

    Shawe-Taylor J, Cristianini N. Kernel methods for pattern recognition, 1st ed. Cambridge, Newyork, USA: Cambridge University Press, 2004.

  • 27.

    Furey TS, Cristianini N, Duffy N, Bednarski DW, Schummer M, Haussler D. Support vector machine classification and validation of canser tissue samples using microarray expression data. Bioinformatics 2000;16:906–14.

    • Crossref
    • Export Citation
  • 28.

    Vucetic S, Obradovic Z, Vacic V, Radivojac P, Peng K, Iakoucheva LM, et al. DisProt: a database of protein disorder. Bioinformatics 2005;21:137–40.

    • Crossref
    • PubMed
    • Export Citation
  • 29.

    Kandemir-Cavas C, Nasibov E. Classification of apoptosis proteins by discriminant analysis. Turk J Biochem 2012;37:54–61.

    • Crossref
    • Export Citation
  • 30.

    Cedano J, Aloy P, Pérez-Pons JA, Querol E. Relation between amino acid composition and cellular location of proteins. J Mol Biol 1997;266:594–600.

    • Crossref
    • PubMed
    • Export Citation
  • 31.

    Oldfield CJ, Dunker AK. Intrinsically disordered proteins and intrinsically disordered protein regions. Annu Rev Biochem 2014;83:553–84.

    • Crossref
    • PubMed
    • Export Citation
  • 32.

    Xue B, Oldfield CJ, Dunker AK, Uversky VN. CDF it all: consensus prediction of intrinsically disordered proteins based on various cumulative distribution functions. FEBS Lett 2009;583:1469–74.

    • Crossref
    • PubMed
    • Export Citation
  • 33.

    Hansen JC, Lu X, Ross ED, Woody RW. Intrinsic protein disorder, amino acid composition, and histone terminal domains. J Biol Chem 2006;281:1853–6.

    • Crossref
    • PubMed
    • Export Citation
  • 34.

    Romero P, Obradovid Z, Kissinger C, Villafranca JE, Dunker AK. Identifying disordered regions in proteins from amino acid sequence. Neural Networks 1997;1:90–5.

  • 35.

    He B, Wang K, Liu Y, Xue B, Uversky VN, Dunker AK. Predicting intrinsic disorder in proteins: an overview. Cell Res 2009;19:929–49.

    • Crossref
    • PubMed
    • Export Citation
  • 36.

    Dunker AK, Brown CJ, Lawson JD, Iakoucheva LM, Obradovic Z. Intrinsic disorder and protein function. Biochemistry 2002;41:6573–82.

    • Crossref
    • PubMed
    • Export Citation
  • 37.

    Radivojac P, Iakoucheva LM, Oldfield CJ, Obradovic Z, Uversky VN, Dunker A.K. Intrinsic disorder and functional proteomics. Biophys J 2007;92:1439–56.

    • Crossref
    • PubMed
    • Export Citation
  • 38.

    Vucetic S, Xie H, Iakoucheva LM, Oldfield CJ, Dunker AK, Obradovic Z, et al. Functional anthology of intrinsic disorder. 2. Cellular components, domains, technicalterms, developmental processes, and coding sequence diversities correlated with long disordered regions. J Proteome Res 2007;6:1899–916.

    • Crossref
    • PubMed
    • Export Citation
  • 39.

    Xie H, Vucetic S, Iakoucheva LM, Oldfield CJ, Dunker AK, Uversky VN, et al. Functional anthology of intrinsic disorder. 1. Biological processes and functions of proteins with long disordered regions. J Proteome Res 2007;6:1882–98.

    • Crossref
    • PubMed
    • Export Citation
  • 40.

    Xie H, Vucetic S, Iakoucheva LM, Oldfield CJ, Dunker AK, Obradovic Z, et al. Functional anthology of intrinsic disorder. 3. Ligands, posttranslational modifications, and diseases associated with intrinsically disordered proteins. J Proteome Res 2007;6:1917–32.

    • Crossref
    • PubMed
    • Export Citation
  • 41.

    Uversky VN, Oldfield CJ, Dunker AK. Intrinsically disordered proteins in human diseases: introducing the D2Concept. Annu Rev Biophy 2008;37:215–46.

    • Crossref
    • Export Citation
  • 42.

    Mulligan VK, Chakrabartty A. Protein misfolding in the late-onset neurodegenerative diseases: common themes and the unique case of amyotrophic lateral sclerosis. Proteins 2013;81:1285–303.

    • Crossref
    • PubMed
    • Export Citation
  • 43.

    Ward JJ, McGuffin LJ, Bryson K, Buxton BF, Jones DT. The DISOPRED server for the prediction of protein disorder. Bioinformatics 2004;20:2138–9.

    • Crossref
    • PubMed
    • Export Citation
  • 44.

    McGuffin LJ. Intrinsic disorder prediction from the analysis of multiple protein fold recognition models. Bioinformatics 2008;24:1798–804.

    • Crossref
    • PubMed
    • Export Citation
  • 45.

    Mizianty MJ, Stach W, Chen K, Kedarisetti KD, Disfani FM, Kurgan L. Improved sequence-based prediction of disordered regions with multilayer fusion of multiple information sources. Bioinformatics 2010;26:i489–96.

    • Crossref
    • PubMed
    • Export Citation
  • 46.

    Walsh I, Martin AJ, Di Domenico T, Vullo A, Pollastri G, Tosatto SC. CSpritz: accurate prediction of protein disorder segments with annotation for homology, secondary structure and linear motifs. Nucleic Acids Res 2011;39:W190–6.

    • Crossref
    • PubMed
    • Export Citation
Purchase article
Get instant unlimited access to the article.
$42.00
Log in
Already have access? Please log in.


or
Log in with your institution

Journal + Issues

Turkish Journal of Biochemistry (TJB), official journal of Turkish Biochemical Society, is issued electronically every 2 months. The main aim of the journal is to support the research and publishing culture by ensuring that every published manuscript has an added value and thus providing international acceptance of the “readability” of the manuscripts published in the journal.

Search