Mass spectrometry based clinical proteomics has emerged as a powerful tool for high-throughput protein profiling and biomarker discovery. Recent improvements in mass spectrometry technology have boosted the potential of proteomic studies in biomedical research. However, the complexity of the proteomic expression introduces new statistical challenges in summarizing and analyzing the acquired data. Statistical methods for optimally processing proteomic data are currently a growing field of research. In this paper we present simple, yet appropriate methods to preprocess, summarize and analyze high-throughput MALDI-FTICR mass spectrometry data, collected in a case-control fashion, while dealing with the statistical challenges that accompany such data. The known statistical properties of the isotopic distribution of the peptide molecules are used to preprocess the spectra and translate the proteomic expression into a condensed data set. Information on either the intensity level or the shape of the identified isotopic clusters is used to derive summary measures on which diagnostic rules for disease status allocation will be based. Results indicate that both the shape of the identified isotopic clusters and the overall intensity level carry information on the class outcome and can be used to predict the presence or absence of the disease.
Funding: Marie Curie Initial Training Network MEDIASRES (“Novel Statistical Methodology for Diagnostic Prognostic and Therapeutic Studies and Systematic Reviews”), (Grant/Award Number: “FP7/2011/290025”) MIMOmics (“Methods for Integrated Analysis of Multiple Omics Datasets”), (Grant/Award Number: “FP7/Health/F5/2012/305280”).
Anderson, N. L. and N. G. Anderson (2002): “The human plasma protein,” Mol. Cell. Proteomics, 1, 845–867. Search in Google Scholar
Bolstad, B. M., R. A. Irizarry, M. Astrand and T. P. Speed (2003): “A comparison of normalization methods for high density oligonucleotide array data based on variance and bias,” Bioinformatics, 19, 185–193. Search in Google Scholar
Burzykowski, T., J. Claesen and D. Valkenborg (2016): “The analysis of peptide-centric mass-spectrometry data utilizing information about the expected isotope distribution,” In: Datta, S and Mertens, B. eds., Statistical Analysis of Proteomics, Metabolomics, and Lipidomics Data using Mass Spectrometry. Berlin, Germany: Springer International Publishing. Search in Google Scholar
Diamandis, E. P. (2004): “Mass spectrometry as a diagnostic and a cancer biomarker discovery tool opportunities and potential limitations,” Mol. Cell. Proteomics, 3, 367–378. Search in Google Scholar
Helsel, D. R. (2012): Statistics for censored enviromental data using MINITAB and R, New Jersey: Wiley. Search in Google Scholar
Horn, D. M., R. A. Zubarev and F. W. McLafferty (2000): “Automated reduction and interpretation of high resolution electrospray mass spectra of large molecules,” J. Am. Soc. Mass Spectr. 11, 320–332. Search in Google Scholar
Le Cessie, S. and J. C. van Houwelingen (1992): “Ridge estimators in logistic regression,” Appl. Stat., 41, 191–201. Search in Google Scholar
Mertens, B., M. E. De Noo, R. A. E. M. Tollenaar and A. M. Deelder (2006): “Mass spectrometry proteomic diagnosis: enacting the double cross-validatory paradigm,” J. Comput. Biol., 13, 1591–1605. Search in Google Scholar
Nicolardi, S., B. J. Velstra, B. Mertens, B. Bosing, W. E. Mesker, R. A. E. M. Tollenaar, A. M. Deelder and Y. E. M. van der Burgt (2014): “Ultrahigh resolution profiles lead to more detailed serum peptidome sugnatures of pancreatic cancer,” Transl. Proteomics, 2, 39–51. Search in Google Scholar
Palmblad, M., J. Buijs and P. Hakanson (2001): “Automatic analysis of hydrogen/deuterium exchange mass spectra of peptides and proteins using calculations of isotopic distributions,” J. Am. Soc. Mass Spectr., 12, 1153–1162. Search in Google Scholar
Park, K., J. Y. Yoon, S. Lee, E. Paek, H. Park, H. J. Jung and S. W. Lee (2008): “Isotopic peak intensity ratio based algorithm for determination of isotopic clusters and monoisotopic masses of polypeptides from high-resolution mass spectrometric data,” J. Anal. Chem., 80, 7294–7303. Search in Google Scholar
Rockwood, A. L. and P. Haimi (2006): “Efficient calculation of accurate masses of isotopic peaks,” J. Am. Soc. Mass Spectr., 17, 415–419. Search in Google Scholar
Sauve, A. C. and T. P. Speed (2004): “Normalization, baseline correction and alignment of high-throughput mass spectrometry data,” Procedings Gensips 2004, 4 pages. Search in Google Scholar
Scheltema, R. (2009): “Simple data-reduction method for high-resolution lc-ms data in metabolomics.” Bioanalysis, 1, 1551–7. Search in Google Scholar
Senko, M. W., S. C. Beu and F. W. McLafferty (1995): “Determination of monoisotopic masses and ion populations for large biomolecules from resolved isotopic distribution,” J. Am. Soc. Mass Spectr., 6, 229–233. Search in Google Scholar
Stone, M. (1974): “Cross-validatory choice and assessment of statistical predictions,” J. R. Stat. Soc., Series B, 36, 111–147. Search in Google Scholar
Valkenborg, D., I. Mertens, F. Lemiere, E. Witters and T. Burzykowski (2012): “The isotopic distribution conundrum,” Mass Spectr. Rev., 31, 96–109. Search in Google Scholar
van der Burgt, Y. E. M., I. M. Taban, M. Konijnenburg, M. Biskup, M. C. Duursma, R. M. A. Heeren, A. Römpp, R. V. van Nieuwpoort and H. E. Bal (2007): “Parallel processing of large datasets from nanolc-fticr-ms measurements,” J. Am. Soc. Mass Spectr., 18, 152–161. Search in Google Scholar
The online version of this article (DOI: 10.1515/sagmb-2016-0005) offers supplementary material, available to authorized users.
©2016 Walter de Gruyter GmbH, Berlin/Boston