Skip to content
Licensed Unlicensed Requires Authentication Published by De Gruyter January 21, 2011

A Three Component Latent Class Model for Robust Semiparametric Gene Discovery

  • Marco Alfo' , Alessio Farcomeni and Luca Tardella

We propose a robust model for discovering differentially expressed genes which directly incorporates biological significance, i.e., effect dimension. Using the so-called c-fold rule, we transform the expressions into a nominal observed random variable with three categories: below a fixed lower threshold, above a fixed upper threshold or within the two thresholds. Gene expression data is then transformed into a nominal variable with three levels possibly originated by three different distributions corresponding to under expressed, not differential, and over expressed genes. This leads to a statistical model for a 3-component mixture of trinomial distributions with suitable constraints on the parameter space. In order to obtain the MLE estimates, we show how to implement a constrained EM algorithm with a latent label for the corresponding component of each gene. Different strategies for a statistically significant gene discovery are discussed and compared. We illustrate the method on a little simulation study and a real dataset on multiple sclerosis.

Published Online: 2011-1-21

©2011 Walter de Gruyter GmbH & Co. KG, Berlin/Boston

Downloaded on 1.12.2023 from https://www.degruyter.com/document/doi/10.2202/1544-6115.1565/html
Scroll to top button