Jump to ContentJump to Main Navigation
Show Summary Details
More options …

Statistical Applications in Genetics and Molecular Biology

Editor-in-Chief: Sanguinetti, Guido


IMPACT FACTOR 2018: 0.536
5-year IMPACT FACTOR: 0.764

CiteScore 2018: 0.49

SCImago Journal Rank (SJR) 2018: 0.316
Source Normalized Impact per Paper (SNIP) 2018: 0.342

Mathematical Citation Quotient (MCQ) 2017: 0.04

Online
ISSN
1544-6115
See all formats and pricing
More options …
Volume 9, Issue 1

Issues

Volume 10 (2011)

Volume 9 (2010)

Volume 6 (2007)

Volume 5 (2006)

Volume 4 (2005)

Volume 2 (2003)

Volume 1 (2002)

Shrinkage Estimation of Effect Sizes as an Alternative to Hypothesis Testing Followed by Estimation in High-Dimensional Biology: Applications to Differential Gene Expression

Zahra Montazeri / Corey M. Yanofsky / David R. Bickel
Published Online: 2010-06-08 | DOI: https://doi.org/10.2202/1544-6115.1504

Research on analyzing microarray data has focused on the problem of identifying differentially expressed genes to the neglect of the problem of how to integrate evidence that a gene is differentially expressed with information on the extent of its differential expression. Consequently, researchers currently prioritize genes for further study either on the basis of volcano plots or, more commonly, according to simple estimates of the fold change after filtering the genes with an arbitrary statistical significance threshold. While the subjective and informal nature of the former practice precludes quantification of its reliability, the latter practice is equivalent to using a hard-threshold estimator of the expression ratio that is not known to perform well in terms of mean-squared error, the sum of estimator variance and squared estimator bias. On the basis of two distinct simulation studies and data from different microarray studies, we systematically compared the performance of several estimators representing both current practice and shrinkage. We find that the threshold-based estimators usually perform worse than the maximum-likelihood estimator (MLE) and they often perform far worse as quantified by estimated mean-squared risk. By contrast, the shrinkage estimators tend to perform as well as or better than the MLE and never much worse than the MLE, as expected from what is known about shrinkage. However, a Bayesian measure of performance based on the prior information that few genes are differentially expressed indicates that hard-threshold estimators perform about as well as the local false discovery rate (FDR), the best of the shrinkage estimators studied. Based on the ability of the latter to leverage information across genes, we conclude that the use of the local-FDR estimator of the fold change instead of informal or threshold-based combinations of statistical tests and non-shrinkage estimators can be expected to substantially improve the reliability of gene prioritization at very little risk of doing so less reliably. Since the proposed replacement of post-selection estimates with shrunken estimates applies as well to other types of high-dimensional data, it could also improve the analysis of SNP data from genome-wide association studies.

Keywords: microarray data analysis; shrinkage estimation; hypothesis testing; simultaneous significance; multiple comparison procedures; empirical Bayes; Bayesian inference; high-dimensional biology; genome-wide association

About the article

Published Online: 2010-06-08


Citation Information: Statistical Applications in Genetics and Molecular Biology, Volume 9, Issue 1, ISSN (Online) 1544-6115, DOI: https://doi.org/10.2202/1544-6115.1504.

Export Citation

©2011 Walter de Gruyter GmbH & Co. KG, Berlin/Boston.Get Permission

Citing Articles

Here you can find all Crossref-listed publications in which this article is cited. If you would like to receive automatic email messages as soon as this article is cited in other publications, simply activate the “Citation Alert” on the top of this page.

[1]
Yufei Xiao, Tzu-Hung Hsiao, Uthra Suresh, Hung-I Harry Chen, Xiaowu Wu, Steven E. Wolf, and Yidong Chen
Bioinformatics, 2014, Volume 30, Number 6, Page 801
[2]
Zahra Montazeri, Corey M. Yanofsky, and David R. Bickel
Statistical Applications in Genetics and Molecular Biology, 2010, Volume 9, Number 1
[3]
David R. Bickel and Marta Padilla
Journal of Statistical Planning and Inference, 2014, Volume 145, Page 204
[4]
J. T. Gene Hwang and Zhigen Zhao
Journal of the American Statistical Association, 2013, Volume 108, Number 502, Page 607

Comments (0)

Please log in or register to comment.
Log in