Accessible Requires Authentication Published by De Gruyter September 28, 2016

Accounting for isotopic clustering in Fourier transform mass spectrometry data analysis for clinical diagnostic studies

Alexia Kakourou, Werner Vach, Simone Nicolardi, Yuri van der Burgt and Bart Mertens


Mass spectrometry based clinical proteomics has emerged as a powerful tool for high-throughput protein profiling and biomarker discovery. Recent improvements in mass spectrometry technology have boosted the potential of proteomic studies in biomedical research. However, the complexity of the proteomic expression introduces new statistical challenges in summarizing and analyzing the acquired data. Statistical methods for optimally processing proteomic data are currently a growing field of research. In this paper we present simple, yet appropriate methods to preprocess, summarize and analyze high-throughput MALDI-FTICR mass spectrometry data, collected in a case-control fashion, while dealing with the statistical challenges that accompany such data. The known statistical properties of the isotopic distribution of the peptide molecules are used to preprocess the spectra and translate the proteomic expression into a condensed data set. Information on either the intensity level or the shape of the identified isotopic clusters is used to derive summary measures on which diagnostic rules for disease status allocation will be based. Results indicate that both the shape of the identified isotopic clusters and the overall intensity level carry information on the class outcome and can be used to predict the presence or absence of the disease.

  1. Funding: Marie Curie Initial Training Network MEDIASRES (“Novel Statistical Methodology for Diagnostic Prognostic and Therapeutic Studies and Systematic Reviews”), (Grant/Award Number: “FP7/2011/290025”) MIMOmics (“Methods for Integrated Analysis of Multiple Omics Datasets”), (Grant/Award Number: “FP7/Health/F5/2012/305280”).


Anderson, N. L. and N. G. Anderson (2002): “The human plasma protein,” Mol. Cell. Proteomics, 1, 845–867. Search in Google Scholar

Bolstad, B. M., R. A. Irizarry, M. Astrand and T. P. Speed (2003): “A comparison of normalization methods for high density oligonucleotide array data based on variance and bias,” Bioinformatics, 19, 185–193. Search in Google Scholar

Burzykowski, T., J. Claesen and D. Valkenborg (2016): “The analysis of peptide-centric mass-spectrometry data utilizing information about the expected isotope distribution,” In: Datta, S and Mertens, B. eds., Statistical Analysis of Proteomics, Metabolomics, and Lipidomics Data using Mass Spectrometry. Berlin, Germany: Springer International Publishing. Search in Google Scholar

Diamandis, E. P. (2004): “Mass spectrometry as a diagnostic and a cancer biomarker discovery tool opportunities and potential limitations,” Mol. Cell. Proteomics, 3, 367–378. Search in Google Scholar

Helsel, D. R. (2012): Statistics for censored enviromental data using MINITAB and R, New Jersey: Wiley. Search in Google Scholar

Horn, D. M., R. A. Zubarev and F. W. McLafferty (2000): “Automated reduction and interpretation of high resolution electrospray mass spectra of large molecules,” J. Am. Soc. Mass Spectr. 11, 320–332. Search in Google Scholar

Le Cessie, S. and J. C. van Houwelingen (1992): “Ridge estimators in logistic regression,” Appl. Stat., 41, 191–201. Search in Google Scholar

Mertens, B., M. E. De Noo, R. A. E. M. Tollenaar and A. M. Deelder (2006): “Mass spectrometry proteomic diagnosis: enacting the double cross-validatory paradigm,” J. Comput. Biol., 13, 1591–1605. Search in Google Scholar

Nicolardi, S., B. J. Velstra, B. Mertens, B. Bosing, W. E. Mesker, R. A. E. M. Tollenaar, A. M. Deelder and Y. E. M. van der Burgt (2014): “Ultrahigh resolution profiles lead to more detailed serum peptidome sugnatures of pancreatic cancer,” Transl. Proteomics, 2, 39–51. Search in Google Scholar

Palmblad, M., J. Buijs and P. Hakanson (2001): “Automatic analysis of hydrogen/deuterium exchange mass spectra of peptides and proteins using calculations of isotopic distributions,” J. Am. Soc. Mass Spectr., 12, 1153–1162. Search in Google Scholar

Park, K., J. Y. Yoon, S. Lee, E. Paek, H. Park, H. J. Jung and S. W. Lee (2008): “Isotopic peak intensity ratio based algorithm for determination of isotopic clusters and monoisotopic masses of polypeptides from high-resolution mass spectrometric data,” J. Anal. Chem., 80, 7294–7303. Search in Google Scholar

Rockwood, A. L. and P. Haimi (2006): “Efficient calculation of accurate masses of isotopic peaks,” J. Am. Soc. Mass Spectr., 17, 415–419. Search in Google Scholar

Sauve, A. C. and T. P. Speed (2004): “Normalization, baseline correction and alignment of high-throughput mass spectrometry data,” Procedings Gensips 2004, 4 pages. Search in Google Scholar

Scheltema, R. (2009): “Simple data-reduction method for high-resolution lc-ms data in metabolomics.” Bioanalysis, 1, 1551–7. Search in Google Scholar

Senko, M. W., S. C. Beu and F. W. McLafferty (1995): “Determination of monoisotopic masses and ion populations for large biomolecules from resolved isotopic distribution,” J. Am. Soc. Mass Spectr., 6, 229–233. Search in Google Scholar

Stone, M. (1974): “Cross-validatory choice and assessment of statistical predictions,” J. R. Stat. Soc., Series B, 36, 111–147. Search in Google Scholar

Valkenborg, D., I. Mertens, F. Lemiere, E. Witters and T. Burzykowski (2012): “The isotopic distribution conundrum,” Mass Spectr. Rev., 31, 96–109. Search in Google Scholar

van der Burgt, Y. E. M., I. M. Taban, M. Konijnenburg, M. Biskup, M. C. Duursma, R. M. A. Heeren, A. Römpp, R. V. van Nieuwpoort and H. E. Bal (2007): “Parallel processing of large datasets from nanolc-fticr-ms measurements,” J. Am. Soc. Mass Spectr., 18, 152–161. Search in Google Scholar

Supplemental Material:

The online version of this article (DOI: 10.1515/sagmb-2016-0005) offers supplementary material, available to authorized users.

Published Online: 2016-9-28
Published in Print: 2016-10-1

©2016 Walter de Gruyter GmbH, Berlin/Boston