Correlation Between Gene Expression Levels and Limitations of the Empirical Bayes Methodology for Finding Differentially Expressed Genes

Xing Qiu / Lev Klebanov
  • Department of Probability and Statistics, Charles University, Institute of Informatics and Control of the National Academy of Sciences of the Czech Republic
/ Andrei Yakovlev
Published Online: 2005-11-22 | DOI: https://doi.org/10.2202/1544-6115.1157

Stochastic dependence between gene expression levels in microarray data is of critical importance for the methods of statistical inference that resort to pooling test statistics across genes. The empirical Bayes methodology in the nonparametric and parametric formulations, as well as closely related methods employing a two-component mixture model, represent typical examples. It is frequently assumed that dependence between gene expressions (or associated test statistics) is sufficiently weak to justify the application of such methods for selecting differentially expressed genes. By applying resampling techniques to simulated and real biological data sets, we have studied a potential impact of the correlation between gene expression levels on the statistical inference based on the empirical Bayes methodology. We report evidence from these analyses that this impact may be quite strong, leading to a high variance of the number of differentially expressed genes. This study also pinpoints specific components of the empirical Bayes method where the reported effect manifests itself.

This article offers supplementary material which is provided at the end of the article.

Keywords: microarray analysis; gene expression; two-sample tests; empirical Bayes method; correlated data; resampling techniques

Published Online: 2005-11-22

Citation Information: Statistical Applications in Genetics and Molecular Biology, ISSN (Online) 1544-6115, ISSN (Print) 2194-6302, DOI: https://doi.org/10.2202/1544-6115.1157.

Citing Articles

Jian Zhang
Journal of the Royal Statistical Society: Series C (Applied Statistics), 2010, Page no
Wenguang Sun and T. Tony Cai
Journal of the Royal Statistical Society: Series B (Statistical Methodology), 2009, Volume 71, Number 2, Page 393
Spiridon Papapetropoulos, Lina Shehadeh, and Donald McCorquodale
Journal of Neuroscience Research, 2007, Volume 85, Number 14, Page 3013
Zhi Wei, Wenguang Sun, Kai Wang, and Hakon Hakonarson
Bioinformatics, 2009, Volume 25, Number 21, Page 2802
David R. Bickel
International Statistical Review, 2013, Volume 81, Number 2, Page 188
Jesse Hemerik and Jelle J. Goeman
Journal of the Royal Statistical Society: Series B (Statistical Methodology), 2017
T. Tony Cai and Weidong Liu
Journal of the American Statistical Association, 2016, Volume 111, Number 513, Page 229
Xin Lu, Anthony Gamst, and Ronghui Xu
IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2010, Volume 7, Number 4, Page 719
Jelle J. Goeman and Aldo Solari
Statistics in Medicine, 2014, Volume 33, Number 11, Page 1946
G. Yaari, C. R. Bolen, J. Thakar, and S. H. Kleinstein
Nucleic Acids Research, 2013, Volume 41, Number 18, Page e170
Jian Zhang and Faming Liang
Biometrics, 2010, Volume 66, Number 4, Page 1078
Feng Li, Françoise Seillier-Moiseiwitsch, and Valeriy R. Korostyshevskiy
Computational Statistics & Data Analysis, 2011, Volume 55, Number 11, Page 3059

