Yanming Di, Sarah C. Emerson, Daniel W. Schafer, Jeffrey A. Kimbrel, Jeff H. Chang
March 26, 2013
RNA sequencing (RNA-Seq) is the current method of choice for characterizing transcriptomes and quantifying gene expression changes. This next generation sequencing-based method provides unprecedented depth and resolution. The negative binomial (NB) probability distribution has been shown to be a useful model for frequencies of mapped RNA-Seq reads and consequently provides a basis for statistical analysis of gene expression. Negative binomial exact tests are available for two-group comparisons but do not extend to negative binomial regression analysis, which is important for examining gene expression as a function of explanatory variables and for adjusted group comparisons accounting for other factors. We address the adequacy of available large-sample tests for the small sample sizes typically available from RNA-Seq studies and consider a higher-order asymptotic (HOA) adjustment to likelihood ratio tests. We demonstrate that 1) the HOA-adjusted likelihood ratio test is practically indistinguishable from the exact test in situations where the exact test is available, 2) the type I error of the HOA test matches the nominal specification in regression settings we examined via simulation, and 3) the power of the likelihood ratio test does not appear to be affected by the HOA adjustment. This work helps clarify the accuracy of the unadjusted likelihood ratio test and the degree of improvement available with the HOA adjustment. Furthermore, the HOA test may be preferable even when the exact test is available because it does not require ad hoc library size adjustments.