Jump to ContentJump to Main Navigation
Show Summary Details
In This Section

Statistical Applications in Genetics and Molecular Biology

Editor-in-Chief: Stumpf, Michael P.H.

6 Issues per year


IMPACT FACTOR 2016: 0.646
5-year IMPACT FACTOR: 1.191

CiteScore 2016: 0.94

SCImago Journal Rank (SJR) 2015: 0.954
Source Normalized Impact per Paper (SNIP) 2015: 0.554

Mathematical Citation Quotient (MCQ) 2015: 0.06

Online
ISSN
1544-6115
See all formats and pricing
In This Section
Volume 9, Issue 1 (Dec 2010)

Issues

Including Probe-Level Measurement Error in Robust Mixture Clustering of Replicated Microarray Gene Expression

Xuejun Liu
  • Nanjing University of Aeronautics and Astronautics
/ Magnus Rattray
  • University of Sheffield
Published Online: 2010-12-09 | DOI: https://doi.org/10.2202/1544-6115.1600

Probabilistic mixture models provide a popular approach to cluster noisy gene expression data for exploring gene function. Since gene expression data obtained from microarray experiments are often associated with significant sources of technical and biological noise, replicated experiments are typically used to deal with data variability, and internal replication (e.g. from multiple probes per gene in an experiment) provides valuable information about technical sources of noise. However, current implementations of mixture models either do not consider the correlation between the replicated measurements for the same experimental condition, or ignore the probe-level measurement error, and thus overlook the rich information about technical noise. Moreover, most current methods use non-robust Gaussian components to describe the data, and these methods are therefore sensitive to non-Gaussian clusters and outliers. In many cases, this will lead to over-estimation of the number of model components as multiple Gaussian components are used to fit a non-Gaussian cluster. We propose a robust Student's t-mixture model, which explicitly handles replicated gene expression data, includes the consideration of probe-level measurement error when available and automatically selects the appropriate number of model components using a minimum message length criterion. We apply the model to gene expression data using probe-level measurements from an Affymetrix probe-level model, multi-mgMOS, which provides uncertainty estimates. The proposed Student's t-mixture model shows robust performance on synthetic data sets with realistic noise characteristics in comparison to a standard Gaussian mixture model and two other previously published methods. We also compare performance with these methods on two yeast time-course data sets and show that the new method obtains more biologically meaningful clusters in terms of enrichment statistics for GO categories and interactions between transcription factors and genes. Automatically selecting the number of components is more computationally efficient than using a model selection approach and allows the methods to be applied to larger data sets.

Keywords: microarray data; gene expression clustering; mixture models

About the article

Published Online: 2010-12-09



Citation Information: Statistical Applications in Genetics and Molecular Biology, ISSN (Online) 1544-6115, DOI: https://doi.org/10.2202/1544-6115.1600. Export Citation

Comments (0)

Please log in or register to comment.
Log in