Variational Bayes Procedure for Effective Classification of Tumor Type with Microarray Gene Expression Data

Takeshi Hayashi 1
  • 1 National Agricultural Research Center


Recently, microarrays that can simultaneously measure the expression levels of thousands of genes have become a valuable tool for classifying tumors. For such classification, where the sample size is usually much smaller than the number of genes, it is essential to construct properly sparse models for accurately predicting tumor types to avoid over-fitting. Bayesian shrinkage estimation is considered a suitable method for providing such sparse models, effectively shrinking estimates of the effects for many irrelevant genes to zero while maintaining those of a small number of relevant genes at significant magnitudes. However, Bayesian analysis usually requires time-consuming computational techniques such as computationally intensive MCMC iterations. This paper describes a computationally effective method of Bayesian shrinkage regression (BSR) incorporating multiple hierarchical structures for constructing a classification model for tumor types using microarray gene expression data. We use a variational approximation method which provides simple approximations of posterior distributions of parameters to reduce computational burden in the Bayesian estimation. This computationally efficient BSR procedure yields a properly sparse model for accurately and rapidly classifying tumor samples. The accuracy of tumor classification is shown to be at least equivalent to that of other methods such as support vector machine and partial least squares using simulated and actual gene expression data sets.

Purchase article
Get instant unlimited access to the article.
Log in
Already have access? Please log in.

Log in with your institution

Journal + Issues

SAGMB publishes significant research on the application of statistical ideas to problems arising from computational biology. The range of topics includes linkage mapping, association studies, gene finding and sequence alignment, protein structure prediction, design and analysis of microarrary data, molecular evolution and phylogenetic trees, DNA topology, and data base search strategies.