Unravelling peptidomes by in silico mining

Johannes Koehbach 1  and Kathryn A.V. Jackson 1
  • 1 School of Biomedical Sciences, The University of Queensland, 4072 St. Lucia QLD, Australia

Abstract

Peptides of great number and diversity occur in all domains of life and exhibit a range of pharmaceutically relevant bioactivities. The complexity of biological samples including human cells or tissues, plant extracts or animal venom cocktails, often impedes the discovery of novel bioactive peptides using mass spectrometrybased peptidomics analysis. An increasing number of publicly available genome and transcriptome datasets, together with refined bioinformatics analysis, allows for rapid identification of novel peptides which may have been previously unrecognized. Moreover, a combination of information extracted from in silico mining approaches together with data derived from mass spectrometrybased studies provides new impetus for future peptidome analyses, including the discovery of novel bioactive peptides that can serve as starting points for drug development.

If the inline PDF is not rendering correctly, you can download the PDF file here.

  • [1] Newman D.J., Cragg G.M., Natural products as sources of new drugs over the 30 years from 1981 to 2010, J. Nat. Prod., 2012, 75, 311-335.

  • [2] Lipinski C.A., Drug-like properties and the causes of poor solubility and poor permeability, J. Pharmacol. Toxicol. Methods, 2000, 44, 235-249.

  • [3] Craik D.J., Fairlie D.P., Liras S., Price D., The future of peptidebased drugs, Chem. Biol. Drug Des., 2013, 81, 136-147.

  • [4] Gruber C.W., Muttenthaler M., Freissmuth M., Ligand-based peptide design and combinatorial peptide libraries to target G protein-coupled receptors, Curr. Pharm. Des., 2010, 16, 3071-3088.

  • [5] Goodson J.L., Nonapeptides and the evolutionary patterning of sociality, Prog. Brain Res., 2008, 170, 3-15.

  • [6] Brogden K.A., Ackermann M., McCray P.B., Jr., Tack B.F., Antimicrobial peptides in animals and their role in host defences, Int. J. Antimicrob. Agents, 2003, 22, 465-478.

  • [7] Zasloff M., Antimicrobial peptides of multicellular organisms, Nature, 2002, 415, 389-395.

  • [8] Schrader M., Selle H., The process chain for peptidomic biomarker discovery, Dis. Markers, 2006, 22, 27-37.

  • [9] Martelli C., Iavarone F., Vincenzoni F., Cabras T., Manconi B., Desiderio C., Messana I., Castagnola M., Top-down peptidomics of bodily fluids, Peptidomics, 2013, 1, 47-64.

  • [10] Finoulst I., Pinkse M., Van Dongen W., Verhaert P., Sample preparation techniques for the untargeted LC-MS-based discovery of peptides in complex biological matrices, J. Biomed. Biotechnol., 2011, 2011, 245291.

  • [11] Gruber C.W., Muttenthaler M., Discovery of defense- and neuropeptides in social ants by genomemining, PLoS ONE, 2012, DOI: 10.1371/journal.pone.0032559.

  • [12] Koehbach J., Attah A.F., Berger A., Hellinger R., Kutchan T.M., Carpenter E.J., Rolf M., Sonibare M.A., Moody J.O., Wong G.K., et al., Cyclotide discovery in Gentianales revisited-identification and characterization of cyclic cystine-knot peptides and their phylogenetic distribution in Rubiaceae plants, Biopolymers, 2013, 100, 438-452.

  • [13] Frith M.C., Forrest A.R., Nourbakhsh E., Pang K.C., Kai C., Kawai J., Carninci P., Hayashizaki Y., Bailey T.L., Grimmond S.M., The Abundance of Short Proteins in the Mammalian Proteome, PLoS Genet., 2006, DOI: 10.1371/journal.pgen.0020052.

  • [14] Jin A.H., Dutertre S., Kaas Q., Lavergne V., Kubala P., Lewis R.J., Alewood P.F., Transcriptomic messiness in the venom duct of Conus miles contributes to conotoxin diversity, Mol. Cell. Proteomics, 2013, 12, 3824-3833.

  • [15] Schrader M., Schulz-Knappe P., Fricker L.D., Historical perspective of peptidomics, EuPA Open Proteom, 2014, 3, 171-182.

  • [16] Cole A.M., Hong T., Boo L.M., Nguyen T., Zhao C., Bristol G., Zack J.A., Waring A.J., Yang O.O., Lehrer R.I., Retrocyclin: a primate peptide that protects cells from infection by T- and M-tropic strains of HIV-1, Proc. Natl. Acad. Sci. U. S. A., 2002, 99, 1813-1818.12

  • [17] Bachmann B.O., Van Lanen S.G., Baltz R.H., Microbial genome mining for accelerated natural products discovery: is a renaissance in the making?, J. Ind. Microbiol. Biotechnol., 2014, 41, 175- 184.

  • [18] Goecks J., Nekrutenko A., Taylor J., Team T.G., Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences, Genome Biol., 2010, http://genomebiology. com/2010/11/8/R86.

  • [19] Okonechnikov K., Golosova O., Fursov M., UGENE-team, Unipro UGENE: a unified bioinformatics toolkit, Bioinformatics, 2012, 28, 1166-1167.

  • [20] Lavergne V., Dutertre S., Jin A.H., Lewis R.J., Taft R.J., Alewood P.F., Systematic interrogation of the Conus marmoreus venom duct transcriptome with ConoSorter reveals 158 novel conotoxins and 13 new gene superfamilies, BMC Genomics, 2013, http://www.biomedcentral.com/1471- 2164/14/708.

  • [21] Misof B., Liu S., Meusemann K., Peters R.S., Donath A., Mayer C., Frandsen P.B., Ware J., Flouri T., Beutel R.G., et al., Phylogenomics resolves the timing and pattern of insect evolution, Science, 2014, 346, 763-767.

  • [22] Christie A.E., Prediction of the peptidomes of Tigriopus californicus and Lepeophtheirus salmonis (Copepoda, Crustacea), Gen. Comp. Endocrinol., 2014, 201, 87-106.

  • [23] Christie A.E., Expansion of the Litopenaeus vannamei and Penaeus monodon peptidomes using transcriptome shotgun assembly sequence data, Gen. Comp. Endocrinol., 2014, 206, 235-254.

  • [24] Altschul S.F., Gish W., Miller W., Myers E.W., Lipman D.J., Basic local alignment search tool, J. Mol. Biol., 1990, 215, 403-410.

  • [25] Artimo P., Jonnalagedda M., Arnold K., Baratin D., Csardi G., de Castro E., Duvaud S., Flegel V., Fortier A., Gasteiger E., et al., ExPASy: SIB bioinformatics resource portal, Nucleic Acids Res., 2012, 40, W597-W603.

  • [26] Sievers F., Wilm A., Dineen D., Gibson T.J., Karplus K., Li W., Lopez R., McWilliam H., Remmert M., Söding J., et al., Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega, Mol. Syst. Biol., 2011, DOI: 10.1038/msb.2011.75.

  • [27] Petersen T.N., Brunak S., von Heijne G., Nielsen H., SignalP 4.0: discriminating signal peptides from transmembrane regions, Nat. Meth., 2011, 8, 785-786.

  • [28] Birney E., Clamp M., Durbin R., GeneWise and Genomewise, Genome Res., 2004, 14, 988-995.

  • [29] Christie A.E., Neuropeptide discovery in Ixodoidea: An in silico investigation using publicly accessible expressed sequence tags, Gen. Comp. Endocrinol., 2008, 157, 174-185.

  • [30] Stewart M.J., Favrel P., Rotgans B., Wang T., Zhao M., Sohail M., O‘Connor W.A., Elizur A., Henry J., Cummins S.F., Neuropeptides encoded by the genomes of the Akoya pearl oyster Pinctata fucata and Pacific oyster Crassostrea gigas: a bioinformatic and peptidomic survey, BMC Genomics, 2014, http://www.biomedcentral. com/1471-2164/15/840.

  • [31] Wang S., Luo X., Zhang S., Yin C., Dou Y., Cai X., Identification of putative insulin-like peptides and components of insulin signaling pathways in parasitic platyhelminths by the use of genome-wide screening, FEBS J., 2014, 281, 877-893.13

  • [32] Liu C., Li H., In Silico Prediction of Post-translational Modifications, In: Yu B & Hinchcliffe M. (Eds.), Methods in Molecular Biology, 1st ed., Humana Press, New York, 2011.

  • [33] Castellana N.E., Payne S.H., Shen Z., Stanke M., Bafna V., Briggs S.P., Discovery and revision of Arabidopsis genes by proteogenomics, Proc. Natl. Acad. Sci. U.S.A., 2008, 105, 21034-21038.

  • [34] Andrews S.J., Rothnagel J.A., Emerging evidence for functional peptides encoded by short open reading frames, Nat. Rev. Genet., 2014, 15, 193-204.

  • [35] Pauli A., Valen E., Schier A.F., Identifying (non-)coding RNAs and small peptides: Challenges and opportunities, BioEssays, 2014, DOI: 10.1002/bies.201400103.

  • [36] Bazzini A.A., Johnstone T.G., Christiano R., Mackowiak S.D., Obermayer B., Fleming E.S., Vejnar C.E., Lee M.T., Rajewsky N., Walther T.C., et al., Identification of small ORFs in vertebrates using ribosome footprinting and evolutionary conservation, EMBO J., 2014, 33, 981-993.

  • [37] Slavoff S.A., Mitchell A.J., Schwaid A.G., Cabili M.N., Ma J., Levin J.Z., Karger A.D., Budnik B.A., Rinn J.L., Saghatelian A., Peptidomic discovery of short open reading frame–encoded peptides in human cells, Nat. Chem. Biol., 2013, 9, 59-64.

  • [38] Ma J., Ward C.C., Jungreis I., Slavoff S.A., Schwaid A.G., Neveu J., Budnik B.A., Kellis M., Saghatelian A., Discovery of human sORF-encoded polypeptides (SEPs) in cell lines and tissue, J. Proteome Res., 2014, 13, 1757-1765.

  • [39] Lu Y., Zhuang Y., Liu J., Mining antimicrobial peptides from small open reading frames in Ciona intestinalis, J. Pept. Sci., 2014, 20, 25-29.

  • [40] Crappe J., Van Criekinge W., Trooskens G., Hayakawa E., Luyten W., Baggerman G., Menschaert G., Combining in silico prediction and ribosome profiling in a genome-wide search for novel putatively coding sORFs, BMC Genomics, 2013, http:// www.biomedcentral.com/1471-2164/14/648.

  • [41] Yang X., Tschaplinski T.J., Hurst G.B., Jawdy S., Abraham P.E., Lankford P.K., Adams R.M., Shah M.B., Hettich R.L., Lindquist E., et al., Discovery and annotation of small proteins using genomics, proteomics, and computational approaches, Genome Res., 2011, 21, 634-641.

  • [42] Kastenmayer J.P., Ni L., Chu A., Kitchen L.E., Au W.-C., Yang H., Carter C.D., Wheeler D., Davis R.W., Boeke J.D., et al., Functional genomics of genes with small open reading frames (sORFs) in S. cerevisiae, Genome Res., 2006, 16, 365-373.

  • [43] Galindo M.I., Pueyo J.I., Fouix S., Bishop S.A., Couso J.P., Peptides Encoded by Short ORFs Control Development and Define a New Eukaryotic Gene Family, PLoS Biol., 2007, DOI: 10.1371/journal.pbio.0050106.

  • [44] Oyama M., Kozuka-Hata H., Suzuki Y., Semba K., Yamamoto T., Sugano S., Diversity of Translation Start Sites May Define Increased Complexity of the Human Short ORFeome, Mol. Cell. Proteomics, 2007, 6, 1000-1006.

  • [45] Hanada K., Akiyama K., Sakurai T., Toyoda T., Shinozaki K., Shiu S.-H., sORF finder: a program package to identify small open reading frames with high coding potential, Bioinformatics, 2010, 26, 399-400.

  • [46] Hanada K., Zhang X., Borevitz J.O., Li W.-H., Shiu S.-H., A large number of novel coding small open reading frames in the intergenic regions of the Arabidopsis thaliana genome are transcribed and/or under purifying selection, Genome Res., 2007, 17, 632-640.14

  • [47] Blankenberg D., Kuster G.V., Coraor N., Ananda G., Lazarus R., Mangan M., Nekrutenko A., Taylor J., Galaxy: A Web-Based Genome Analysis Tool for Experimentalists, Curr. Protoc. Mol. Biol, 2010, DOI: 10.1002/0471142727.mb1910s89.

  • [48] Giardine B., Riemer C., Hardison R.C., Burhans R., Elnitski L., Shah P., Zhang Y., Blankenberg D., Albert I., Taylor J., et al., Galaxy: A platform for interactive large-scale genome analysis, Genome Res., 2005, 15, 1451-1455.

  • [49] Pruess M., Apweiler R., Bioinformatics Resources for In Silico Proteome Analysis, J. Biomed. Biotechnol., 2003, 4, 231-236.

  • [50] Le T.T., Lehnert S., Colgrave M.L., Neuropeptidomics applied to studies of mammalian reproduction, Peptidomics, 2013, 1, 1-13.

  • [51] Romanova E.V., Dowd S.E., Sweedler J.V., Quantitation of endogenous peptides using mass spectrometry based methods, Curr. Opin. Chem. Biol., 2013, 17, 801-808.

  • [52] Hashempour H., Koehbach J., Daly N.L., Ghassempour A., Gruber C.W., Characterizing circular peptides in mixtures: sequence fragment assembly of cyclotides from a violet plant by MALDITOF/TOF mass spectrometry, Amino Acids, 2013, 44, 581-595.

  • [53] Ueberheide B.M., Fenyö D., Alewood P.F., Chait B.T., Rapid sensitive analysis of cysteine rich peptide venom components, Proc. Natl. Acad. Sci. U. S. A., 2009, 106, 6910-6915.

  • [54] Góngora-Castillo E., Buell C.R., Bioinformatics challenges in de novo transcriptome assembly using short read sequences in the absence of a reference genome sequence, Nat. Prod. Rep., 2013, 30, 490- 500.

  • [55] Cahais V., Gayral P., Tsagkogeorga G., Melo-Ferreira J., Ballenghien M., Weinert L., Chiari Y., Belkhir K., Ranwez V., Galtier N., Reference-free transcriptome assembly in non-model animals from next-generation sequencing data, Mol. Ecol. Resour., 2012, 12, 834-845.

  • [56] Jakubowski J.A., Keays D.A., Kelley W.P., Sandall D.W., Bingham J.P., Livett B.G., Gayler K.R., Sweedler J.V., Determining sequences and post-translational modifications of novel conotoxins in Conus victoriae using cDNA sequencing and mass spectrometry, J. Mass Spectrom., 2004, 39, 548- 557.

  • [57] Ma M., Gard A.L., Xiang F., Wang J., Davoodian N., Lenz P.H., Malecha S.R., Christie A.E., Li L., Combining in silico transcriptome mining and biological mass spectrometry for neuropeptide discovery in the Pacific white shrimp Litopenaeus vannamei, Peptides, 2010, 31, 27-43.

  • [58] Safavi-Hemami H., Hu H., Gorasia D.G., Bandyopadhyay P.K., Veith P.D., Young N.D., Reynolds E.C., Yandell M., Olivera B.M., Purcell A.W., Combined proteomic and transcriptomic interrogation of the venom gland of Conus geographus uncovers novel components and functional compartmentalization, Mol. Cell. Proteomics, 2014, 13, 938-953.

  • [59] Kersten R.D., Yang Y.L., Xu Y., Cimermancic P., Nam S.J., Fenical W., Fischbach M.A., Moore B.S., Dorrestein P.C., A mass spectrometry-guided genome mining approach for natural product peptidogenomics, Nat. Chem. Biol., 2011, 7, 794-802.

  • [60] Mohimani H., Kersten R.D., Liu W.T., Wang M., Purvine S.O., Wu S., Brewer H.M., Pasa-Tolic L., Bandeira N., Moore B.S., et al., Automated genome mining of ribosomal peptide natural products, ACS Chem. Biol., 2014, 9, 1545-1551.15

  • [61] Mohimani H., Liu W.T., Kersten R.D., Moore B.S., Dorrestein P.C., Pevzner P.A., NRPquest: Coupling Mass Spectrometry and Genome Mining for Nonribosomal Peptide Discovery, J. Nat. Prod., 2014, 77, 1902-1909.

  • [62] Medema M.H., Paalvast Y., Nguyen D.D., Melnik A., Dorrestein P.C., Takano E., Breitling R., Pep2Path: Automated Mass Spectrometry-Guided Genome Mining of Peptidic Natural Products, PLoS Comput. Biol., 2014, DOI: 10.1371/journal. pcbi.1003822.

  • [63] Clark R.J., Fischer H., Nevin S.T., Adams D.J., Craik D.J., The synthesis, structural characterization, and receptor specificity of the alpha-conotoxin Vc1.1, J. Biol. Chem., 2006, 281, 23254-23263.

  • [64] Koehbach J., O‘Brien M., Muttenthaler M., Miazzo M., Akcan M., Elliott A.G., Daly N.L., Harvey P.J., Arrowsmith S., Gunasekera S., et al., Oxytocic plant cyclotides as templates for peptide G protein-coupled receptor ligand design, Proc. Natl. Acad. Sci. U. S. A., 2013, 110, 21183-21188.

  • [65] Ladoukakis E., Pereira V., Magny E., Eyre-Walker A., Couso J.P., Hundreds of putatively functional small open reading frames in Drosophila, Genome Biol., 2011, http://genomebiology. com/2011/12/11/R118.

  • [66] Clamp M., Fry B., Kamal M., Xie X., Cuff J., Lin M.F., Kellis M., Lindblad-Toh K., Lander E.S., Distinguishing protein-coding and noncoding genes in the human genome, Proc. Natl. Acad. Sci. U.S.A., 2007, 104, 19428-19433.

  • [67] Koehbach J., Stockner T., Bergmayr C., Muttenthaler M., Gruber C.W., Insights into the molecular evolution of oxytocin receptor ligand binding, Biochem. Soc. Trans., 2013, 41, 197-204.

  • [68] Gruber C.W., Physiology of invertebrate oxytocin and vasopressin neuropeptides, Exp. Physiol., 2014, 99, 55-61.

  • [69] Ingolia N.T., Ghaemmaghami S., Newman J.R.S., Weissman J.S., Genome-Wide Analysis in Vivo of Translation with Nucleotide Resolution Using Ribosome Profiling, Science, 2009, 324, 218-223.

  • [70] Guttman M., Russell P., Ingolia N.T., Weissman J.S., Lander E.S., Ribosome Profiling Provides Evidence that Large Noncoding RNAs Do Not Encode Proteins, Cell, 2013, 154, 240-251.

  • [71] Tamura K., Stecher G., Peterson D., Filipski A., Kumar S., MEGA6: Molecular Evolutionary Genetics Analysis Version 6.0, Mol. Biol. Evol., 2013, 30, 2725-2729.

  • [72] Edgar R.C., MUSCLE: multiple sequence alignment with high accuracy and high throughput, Nucleic Acids Res., 2004, 32, 1792-1797.

  • [73] Sigrist C.J.A., de Castro E., Cerutti L., Cuche B.A., Hulo N., Bridge A., Bougueleret L., Xenarios I., New and continuing developments at PROSITE, Nucleic Acids Res., 2013, 41, D344-D347.

  • [74] Rawlings N.D., Waller M., Barrett A.J., Bateman A., MEROPS: the database of proteolytic enzymes, their substrates and inhibitors, Nucleic Acids Res., 2014, 42, D503-D509.

  • [75] Stanke M., Keller O., Gunduz I., Hayes A., Waack S., Morgenstern B., AUGUSTUS: ab initio prediction of alternative transcripts, Nucleic Acids Res., 2006, 34, W435-W439.

  • [76] Lin M.F., Jungreis I., Kellis M., PhyloCSF: a comparative genomics method to distinguish protein coding and non-coding regions, Bioinformatics, 2011, 27, i275-i282.16

  • [77] Ferrè F., Clote P., DiANNA: a web server for disulfide connectivity prediction, Nucleic Acids Res., 2005, 33, W230-W232.

  • [78] Xue Y., Liu Z., Cao J., Ma Q., Gao X., Wang Q., Jin C., Zhou Y., Wen L., Ren J., GPS 2.1: enhanced prediction of kinase-specific phosphorylation sites with an algorithm of motif length selection, Protein Eng. Des. Sel., 2011, 24, 255-260.

  • [79] Consortium T.U., Activities at the Universal Protein Resource (UniProt), Nucleic Acids Res., 2014, 42, D191-D198.17

OPEN ACCESS

Journal + Issues

Search