A Comparison of Methods to Control Type I Errors in Microarray Studies

Jinsong Chen 1 , Mark J. van der Laan 2 , Martyn T. Smith 3  and Alan E. Hubbard 4
  • 1 Lawrence Berkeley National Laboratory
  • 2 University of California, Berkeley
  • 3 University of California, Berkeley
  • 4 University of California, Berkeley

Microarray studies often need to simultaneously examine thousands of genes to determine which are differentially expressed. One main challenge in those studies is to find suitable multiple testing procedures that provide accurate control of the error rates of interest and meanwhile are most powerful, that is, they return the longest list of truly interesting genes among competitors. Many multiple testing methods have been developed recently for microarray data analysis, especially resampling based methods, such as permutation methods, the null-centered and scaled bootstrap (NCSB) method, and the quantile-transformed-bootstrap-distribution (QTBD) method. Each of these methods has its own merits and limitations. Theoretically permutation methods can fail to provide accurate control of Type I errors when the so-called subset pivotality condition is violated. The NCSB method does not suffer from that limitation, but an impractical number of bootstrap samples are often needed to get proper control of Type I errors. The newly developed QTBD method has the virtues of providing accurate control of Type I errors under few restrictions. However, the relative practical performance of the above three types of multiple testing methods remains unresolved. This paper compares the above three resampling based methods according to the control of family wise error rates (FWER) through data simulations. Results show that among the three resampling based methods, the QTBD method provides relatively accurate and powerful control in more general circumstances.

Purchase article
Get instant unlimited access to the article.
Log in
Already have access? Please log in.

Log in with your institution

Journal + Issues

SAGMB publishes significant research on the application of statistical ideas to problems arising from computational biology. The range of topics includes linkage mapping, association studies, gene finding and sequence alignment, protein structure prediction, design and analysis of microarrary data, molecular evolution and phylogenetic trees, DNA topology, and data base search strategies.