Jump to ContentJump to Main Navigation
Show Summary Details
More options …

Statistical Applications in Genetics and Molecular Biology

Editor-in-Chief: Sanguinetti, Guido

IMPACT FACTOR 2018: 0.536
5-year IMPACT FACTOR: 0.764

CiteScore 2018: 0.49

SCImago Journal Rank (SJR) 2018: 0.316
Source Normalized Impact per Paper (SNIP) 2018: 0.342

Mathematical Citation Quotient (MCQ) 2018: 0.02

See all formats and pricing
More options …
Volume 9, Issue 1


Volume 10 (2011)

Volume 9 (2010)

Volume 6 (2007)

Volume 5 (2006)

Volume 4 (2005)

Volume 2 (2003)

Volume 1 (2002)

Including Probe-Level Measurement Error in Robust Mixture Clustering of Replicated Microarray Gene Expression

Xuejun Liu / Magnus Rattray
Published Online: 2010-12-09 | DOI: https://doi.org/10.2202/1544-6115.1600

Probabilistic mixture models provide a popular approach to cluster noisy gene expression data for exploring gene function. Since gene expression data obtained from microarray experiments are often associated with significant sources of technical and biological noise, replicated experiments are typically used to deal with data variability, and internal replication (e.g. from multiple probes per gene in an experiment) provides valuable information about technical sources of noise. However, current implementations of mixture models either do not consider the correlation between the replicated measurements for the same experimental condition, or ignore the probe-level measurement error, and thus overlook the rich information about technical noise. Moreover, most current methods use non-robust Gaussian components to describe the data, and these methods are therefore sensitive to non-Gaussian clusters and outliers. In many cases, this will lead to over-estimation of the number of model components as multiple Gaussian components are used to fit a non-Gaussian cluster. We propose a robust Student's t-mixture model, which explicitly handles replicated gene expression data, includes the consideration of probe-level measurement error when available and automatically selects the appropriate number of model components using a minimum message length criterion. We apply the model to gene expression data using probe-level measurements from an Affymetrix probe-level model, multi-mgMOS, which provides uncertainty estimates. The proposed Student's t-mixture model shows robust performance on synthetic data sets with realistic noise characteristics in comparison to a standard Gaussian mixture model and two other previously published methods. We also compare performance with these methods on two yeast time-course data sets and show that the new method obtains more biologically meaningful clusters in terms of enrichment statistics for GO categories and interactions between transcription factors and genes. Automatically selecting the number of components is more computationally efficient than using a model selection approach and allows the methods to be applied to larger data sets.

Keywords: microarray data; gene expression clustering; mixture models

About the article

Published Online: 2010-12-09

Citation Information: Statistical Applications in Genetics and Molecular Biology, Volume 9, Issue 1, ISSN (Online) 1544-6115, DOI: https://doi.org/10.2202/1544-6115.1600.

Export Citation

©2011 Walter de Gruyter GmbH & Co. KG, Berlin/Boston.Get Permission

Comments (0)

Please log in or register to comment.
Log in