Jump to ContentJump to Main Navigation

Statistical Applications in Genetics and Molecular Biology

Editor-in-Chief: Stumpf, Michael P.H.

6 Issues per year

Increased IMPACT FACTOR 2012: 1.717
Rank 18 out of 117 in category Statistics & Probability in the 2012 Thomson Reuters Journal Citation Report/Science Edition
Mathematical Citation Quotient 2012: 0.07


Two-Stage Model-Based Clustering for Liquid Chromatography Mass Spectrometry Data Analysis

Marta Łuksza1 / Bogusław Kluge2 / Jerzy Ostrowski3 / Jakub Karczmarski4 / Anna Gambin5

1Max Planck Institute for Molecular Genetics

2University of Warsaw

3Maria Sklodowska-Curie Memorial Institute of Oncology

4Maria Sklodowska-Curie Memorial Institute of Oncology

5University of Warsaw

Citation Information: Statistical Applications in Genetics and Molecular Biology. Volume 8, Issue 1, Pages 1–34, ISSN (Online) 1544-6115, DOI: 10.2202/1544-6115.1308, February 2009

Publication History

Published Online:

Proteomic mass spectrometry is gaining an increasing role in diagnostics and in studies on protein complexes and biological systems. This experimental technology is producing high-throughput data which is inherently noisy and may contain various errors. Mathematical processing can help in removing them.

In this paper we focus on the peak alignment problem in LC-MS spectra. As an alternative to heuristic approaches to the problem, we propose a mathematically sound method which exploits a model-based clustering. In this framework experiment errors are modeled as deviations from real values and mass spectra are regarded as finite Gaussian mixtures. The advantage of such an approach is that it provides convenient techniques for adjusting parameters and selecting solutions of best quality. The method can be parameterized by assuming various constraints. In this paper we investigate and compare different classes of models. We analyze the results in terms of statistically significant biomarkers that can be identified after the alignment of spectra. The study was conducted on a dataset of plasma samples of colorectal cancer patients and healthy donors.

Keywords: mass spectrometry; peak alignment; clustering; Gaussian mixtures

Comments (0)

Please log in or register to comment.
Users without a subscription are not able to see the full content. Please, subscribe or login to access all content.