Jump to ContentJump to Main Navigation
Show Summary Details
More options …

Statistical Applications in Genetics and Molecular Biology

Editor-in-Chief: Stumpf, Michael P.H.

6 Issues per year

IMPACT FACTOR 2016: 0.646
5-year IMPACT FACTOR: 1.191

CiteScore 2016: 0.94

SCImago Journal Rank (SJR) 2016: 0.625
Source Normalized Impact per Paper (SNIP) 2016: 0.596

Mathematical Citation Quotient (MCQ) 2016: 0.06

See all formats and pricing
More options …
Volume 12, Issue 1


Volume 10 (2011)

Volume 9 (2010)

Volume 6 (2007)

Volume 5 (2006)

Volume 4 (2005)

Volume 2 (2003)

Volume 1 (2002)

Higher order asymptotics for negative binomial regression inferences from RNA-sequencing data

Yanming Di / Sarah C. Emerson / Daniel W. Schafer / Jeffrey A. Kimbrel / Jeff H. Chang
Published Online: 2013-03-26 | DOI: https://doi.org/10.1515/sagmb-2012-0071


RNA sequencing (RNA-Seq) is the current method of choice for characterizing transcriptomes and quantifying gene expression changes. This next generation sequencing-based method provides unprecedented depth and resolution. The negative binomial (NB) probability distribution has been shown to be a useful model for frequencies of mapped RNA-Seq reads and consequently provides a basis for statistical analysis of gene expression. Negative binomial exact tests are available for two-group comparisons but do not extend to negative binomial regression analysis, which is important for examining gene expression as a function of explanatory variables and for adjusted group comparisons accounting for other factors. We address the adequacy of available large-sample tests for the small sample sizes typically available from RNA-Seq studies and consider a higher-order asymptotic (HOA) adjustment to likelihood ratio tests. We demonstrate that 1) the HOA-adjusted likelihood ratio test is practically indistinguishable from the exact test in situations where the exact test is available, 2) the type I error of the HOA test matches the nominal specification in regression settings we examined via simulation, and 3) the power of the likelihood ratio test does not appear to be affected by the HOA adjustment. This work helps clarify the accuracy of the unadjusted likelihood ratio test and the degree of improvement available with the HOA adjustment. Furthermore, the HOA test may be preferable even when the exact test is available because it does not require ad hoc library size adjustments.

Keywords: RNA-Seq; higher-order asymptotics; negative binomial; regression; overdispersion; extra-Poisson variation


  • Anders, S. and W. Huber (2010): “Differential expression analysis for sequence count data,” Genome Biol., 11, R106.CrossrefGoogle Scholar

  • Barndorff-Nielsen, O. (1986): “Infereni on full or partial parameters based on the standardized signed log likelihood ratio,” Biometrika, 73, 307–322.Google Scholar

  • Barndorff-Nielsen, O. (1991): “Modified signed log likelihood ratio,” Biometrika, 78, 557–563.CrossrefGoogle Scholar

  • Buell, C., V. Joardar, M. Lindeberg, J. Selengut, I. Paulsen, M. Gwinn, R. Dod-son, R. Deboy, A. Durkin, J. Kolonay, et al. (2003): “The complete genome sequence of the Arabidopsis and tomato pathogen Pseudomonas syringae pv. tomato DC 3000,” Proc. Natl. Acad. Sci. USA, 100, 10181.CrossrefGoogle Scholar

  • Cox, D. R. and N. Reid (1987): “Parameter orthogonality and approximate conditional inference,” J. R. Stat. Soc. Series B Stat. Methodol., 49, 1–39.Google Scholar

  • Cumbie, J., J. Kimbrel, Y. Di, D. Schafer, L. Wilhelm, S. Fox, C. Sullivan, A. Curzon, J. Carrington, T. Mockler, et al. (2011): “GENE-Counter: A computational pipeline for the analysis of RNA-Seq data for gene expression differences,” PLoS ONE, 6, e25279.Google Scholar

  • Di, Y., D. Schafer, J. Cumbie, and J. Chang (2011): “The NBP negative binomial model for assessing differential gene expression from RNA-Seq,” Stat. Appl. Genet. Mol. Biol., 10, 24.Web of ScienceGoogle Scholar

  • Fears, T., J. Benichou, and M. Gail (1996): “A reminder of the fallibility of the wald statistic,” American Statistician, 50, 226–227.Google Scholar

  • Hilbe, J. M. (2007): Negative Binomial Regression. Cambridge, UK: Cambridge University Press.Google Scholar

  • Huynh, T., D. Dahlbeck, and B. Staskawicz (1989): “Bacterial blight of soybean: regulation of a pathogen gene determining host cultivar specificity,” Science, 245, 1374.CrossrefPubMedGoogle Scholar

  • Lancaster, H. (1961): “Significance tests in discrete distributions,” J. Am. Stat. Assoc., 56, 223–234.CrossrefGoogle Scholar

  • Lindeberg, M., S. Cartinhour, C. Myers, L. Schechter, D. Schneider, and A. Collmer (2006): “Closing the circle on the discovery of genes encoding Hrp regulon members and type III secretion system effectors in the genomes of three model pseudomonas syringae strains,” Mol. Plant Microbe Interact., 19, 1151–1158.Google Scholar

  • Lund, S., D. Nettleton, D. McCarthy, and G. Smyth (2012): “Detecting differential expression in RNA-sequence data using quasi-likelihood with shrunken dispersion estimates,” Stat. Appl. Genet. Mol. Biol., 11, in press. ISSN online (1544–6115).Web of ScienceGoogle Scholar

  • Marioni, J. C., C. E. Mason, S. M. Mane, M. Stephens, and Y. Gilad (2008): “RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays,” Genome Res., 18, 1509–1517.Web of SciencePubMedCrossrefGoogle Scholar

  • McCarthy, D., Y. Chen, and G. Smyth (2012): “Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation,” Nucleic Acids Res., 40, 4288–4297.PubMedWeb of ScienceCrossrefGoogle Scholar

  • Mortazavi, A., B. A. Williams, K. McCue, L. Schaeffer, and B. Wold (2008): “Mapping and quantifying mammalian transcriptomes by RNA-Seq,” Nat. Methods, 5, 621–628.Web of ScienceCrossrefPubMedGoogle Scholar

  • Nagalakshmi, U., Z. Wang, K. Waern, C. Shou, D. Raha, M. Gerstein, and M. Snyder (2008): “The transcriptional landscape of the yeast genome defined by RNA sequencing,” Science, 320, 1344–1349.CrossrefPubMedWeb of ScienceGoogle Scholar

  • Pierce, D. and D. Peters (1992): “Practical use of higher order asymptotics for multiparameter exponential families,” J. R. Stat. Soc. Series B Stat. Methodol., 54, 701–737.Google Scholar

  • Pierce, D. and D. Peters (1999): “Improving on exact tests by approximate conditioning,” Biometrika, 86, 265–277.CrossrefGoogle Scholar

  • Robinson, M. D. and G. K. Smyth (2007): “Moderated statistical tests for assessing differences in tag abundance,” Bioinformatics, 23, 2881–2887.PubMedWeb of ScienceCrossrefGoogle Scholar

  • Robinson, M. D. and G. K. Smyth (2008): “Small-sample estimation of negative binomial dispersion, with applications to SAGE data,” Biostat., 9, 321–332.Google Scholar

  • Robinson, M. D. and A. Oshlack (2010): “A scaling normalization method for differential expression analysis of RNA-seq data,” Genome Biol., 11, R25.CrossrefGoogle Scholar

  • Robinson, M. D., D. J. McCarthy, and G. K. Smyth (2010): “edgeR: a Bioconductor package for differential expression analysis of digital gene expression data,” Bioinformatics, 26, 139–140.PubMedCrossrefGoogle Scholar

  • R Development Core Team (2012): R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria, URL http://www.R-project.org/ , ISBN 3-900051-07-0.Google Scholar

  • Skovgaard, I. (1996): “An explicit large-deviation approximation to one-parameter tests,” Bernoulli, 2, 145–165.CrossrefGoogle Scholar

  • Skovgaard, I. (2001): “Likelihood asymptotics,” Scandinavian Journal of Statistics, 28, 3–32.CrossrefGoogle Scholar

  • Storey, J. D. and R. Tibshirani (2003): “Statistical significance for genomewide studies,” Proc. Natl. Acad. Sci. USA, 100, 9440–9445.CrossrefGoogle Scholar

  • Trapnell, C., B. Williams, G. Pertea, A. Mortazavi, G. Kwan, M. Van Baren, S. Salzberg, B. Wold, and L. Pachter (2010): “Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation,” Nat. Biotechnol., 28, 511–515.PubMedWeb of ScienceCrossrefGoogle Scholar

  • Væth, M. (1985): “On the use of Wald’s test in exponential families,” Int. Stat. Rev., 53, 199–214.CrossrefGoogle Scholar

  • Venables, W. and B. Ripley (2002): Modern applied statistics with S. New York, USA: Springer verlag.Google Scholar

  • Wald, A. (1941): “Asymptotically most powerful tests of statistical hypotheses,” Ann. Math. Statist., 12, 1–19.CrossrefGoogle Scholar

  • Wald, A. (1943): “Tests of statistical hypotheses concerning several parameters when the number of observations is large,” Trans. Amer. Math. Soc., 54, 426–482.CrossrefGoogle Scholar

  • Wang, Z., M. Gerstein, and M. Snyder (2009): “RNA-Seq: a revolutionary tool for transcriptomics,” Nat. Rev. Genet., 10, 57–63.Web of SciencePubMedCrossrefGoogle Scholar

  • Wilks, S. (1938): “The large-sample distribution of the likelihood ratio for testing composite hypotheses,” Ann. Math. Statist., 9, 60–62.CrossrefGoogle Scholar

  • Zhou, Y., K. Xia, and F. Wright (2011): “A powerful and flexible approach to the analysis of rna sequence count data,” Bioinformatics, 27, 2672–2678.CrossrefPubMedGoogle Scholar

About the article

Corresponding author: Yanming Di, Department of Statistics, Oregon State University, 44 Kidder Hall, Corvallis, OR 97330, USA

Published Online: 2013-03-26

Citation Information: Statistical Applications in Genetics and Molecular Biology, Volume 12, Issue 1, Pages 49–70, ISSN (Online) 1544-6115, ISSN (Print) 2194-6302, DOI: https://doi.org/10.1515/sagmb-2012-0071.

Export Citation

©2013 by Walter de Gruyter Berlin Boston. Copyright Clearance Center

Citing Articles

Here you can find all Crossref-listed publications in which this article is cited. If you would like to receive automatic email messages as soon as this article is cited in other publications, simply activate the “Citation Alert” on the top of this page.

A Grant Schissler, Walter W Piegorsch, and Yves A Lussier
Statistical Methods in Medical Research, 2017, Page 096228021771227
Bin Zhuo, Sarah Emerson, Jeff H. Chang, and Yanming Di
PeerJ, 2016, Volume 4, Page e2791

Comments (0)

Please log in or register to comment.
Log in