Jump to ContentJump to Main Navigation
Show Summary Details
More options …

Statistical Applications in Genetics and Molecular Biology

Editor-in-Chief: Sanguinetti, Guido

IMPACT FACTOR 2018: 0.536
5-year IMPACT FACTOR: 0.764

CiteScore 2018: 0.49

SCImago Journal Rank (SJR) 2018: 0.316
Source Normalized Impact per Paper (SNIP) 2018: 0.342

Mathematical Citation Quotient (MCQ) 2018: 0.02

See all formats and pricing
More options …
Volume 16, Issue 5-6


Volume 10 (2011)

Volume 9 (2010)

Volume 6 (2007)

Volume 5 (2006)

Volume 4 (2005)

Volume 2 (2003)

Volume 1 (2002)

Modifying SAMseq to account for asymmetry in the distribution of effect sizes when identifying differentially expressed genes

Ekua Kotoka / Megan Orr
Published Online: 2017-10-27 | DOI: https://doi.org/10.1515/sagmb-2016-0037


RNA-Seq is a developing technology for generating gene expression data by directly sequencing mRNA molecules in a sample. RNA-Seq data consist of counts of reads recorded to a particular gene that are often used to identify differentially expressed (DE) genes. A common statistical method used to analyze RNA-Seq data is Significance Analysis of Microarray with emphasis on RNA-Seq data (SAMseq). SAMseq is a nonparametric method that uses a resampling technique to account for differences in sequencing depths when identifying DE genes. We propose a modification of this method that takes into account asymmetry in the distribution of the effect sizes by taking into account the sign of the test statistics. Through simulation studies, we showthat the proposed method, comparedwith the traditional SAMseqmethod and other existing methods provides better power for identifying truly DE genes or more sufficiently controls FDR in most settings where asymmetry is present. We illustrate the use of the proposed method by analyzing an RNA-Seq data set containing C57BL/6J (B6) and DBA/2J (D2) mouse strains samples.

Keywords: differentially expressed genes; false discovery rate; RNA-Seq; SAMseq


  • Audic, S. and J. M. Claverie (1997): “The significance of digital gene expression profiles,” Genome Res., 7, 986–995.CrossrefPubMedGoogle Scholar

  • Benjamini, Y. and Y. Hochberg (1995): “Controlling the false discovery rate: a practical and powerful approach to multiple testing,” J. R. Stat. Soc. Ser. B Methodol., 57, 289–300.Google Scholar

  • Bennett, S. T., C. Barnes, A. Cox, L. Davies and C. Brown (2005): “Toward the $1000 human genome,” Pharmacogenomics, 6, 373–382.CrossrefPubMedGoogle Scholar

  • Bottomly, D., N. A. R. Walter, J. E. Hunter, P. Darakjian, S. Kawane, K. J. Buck, R. P. Searles, M. Mooney, S. K. McWeeney and R. Hitzemann (2011): “Evaluating gene expression in C57BL/6J and DBA/2J mouse striatum using RNA-Seq and microarrays,” PLoS One, 6, e17820.CrossrefPubMedWeb of ScienceGoogle Scholar

  • Brown, P. O. and D. Botstein (1999): “Exploring the new world of the genome with DNA microarrays,” Nat. Genet., 21(1 Suppl), 33–37.CrossrefPubMedGoogle Scholar

  • Bullard, J. H., E. Purdom, K. D. Hansen and S. Dudoit (2010): “Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments,” BMC Bioinformatics, 11, 94.PubMedWeb of ScienceCrossrefGoogle Scholar

  • Chu, Y. and D. R. Corey (2012): “RNA sequencing: platform selection, experimental design, and data interpretation,” Nucleic Acid Ther., 22, 271–274.PubMedWeb of ScienceGoogle Scholar

  • DeRisi, J. L., V. R. Iyer and P. O. Brown (1997): “Exploring the metabolic and genetic control of gene expression on a genomic scale,” Science, 278, 680–686.CrossrefPubMedGoogle Scholar

  • Di, Y., D. W. Schafer, J. S. Cumbie and J. H. Chang (2011): “The NBP negative binomial model for assessing differential gene expression from RNA-Seq,” Stat. Appl. Genet. Mol. Biol., 10, 1–28.Web of ScienceGoogle Scholar

  • Eisen, M. B. and P. O. Brown (1999): DNA arrays for analysis of gene expression. Methods Enzymol., 303, 179–205.CrossrefPubMedGoogle Scholar

  • Frazee, A. C., B. Langmead and J. T. Leek (2011): “ReCount: a multi-experiment resource of analysis-ready RNA-seq gene count datasets,” BMC Bioinformatics, 12, 449.PubMedCrossrefWeb of ScienceGoogle Scholar

  • Fu, X., N. Fu, S. Guo, Z. Yan, Y. Xu, H. Hu, C. Menzel, W. Chen, Y. Li, R. Zeng and P. Khaitovich (2009): “Estimating accuracy of RNA-Seq and microarrays with proteomics,” BMC Genomics, 10, 161.CrossrefPubMedWeb of ScienceGoogle Scholar

  • Kal, A. J., A. J. van Zonneveld, V. Benes, M. van den Berg, M. G. Koerkamp, K. Albermann, N. Strack, J. M. Ruijter, A. Richter, B. Dujon, W. Ansorge and H. F. Tabak (1999): “Dynamics of gene expression revealed by comparison of serial analysis of gene expression transcript profiles from yeast grown on two different carbon sources,” Mol. Biol. Cell, 10, 1859–1872.CrossrefPubMedGoogle Scholar

  • Li, J. and R. Tibshirani (2013): “Finding consistent patterns: a nonparametric approach for identifying differential expression in RNA-Seq data,” Stat. Methods Med. Res., 22, 519–536.Web of SciencePubMedCrossrefGoogle Scholar

  • Li, J., D. M. Witten, I. M. Johnstone and R. Tibshirani (2012): “Normalization, testing, and false discovery rate estimation for RNA-sequencing data,” Biostatistics, 13, 523–538.CrossrefWeb of SciencePubMedGoogle Scholar

  • Love, M. I., W. Huber and S. Anders (2014): “Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2,” Genome Biol., 15, 1–21.Web of ScienceGoogle Scholar

  • Madden, S. L., E. A. Galella, J. Zhu, A. H. Bertelsen and G. A. Beaudry (1997): “SAGE transcript profiles for p53-dependent growth regulation,” Oncogene, 15, 1079–1085.PubMedCrossrefGoogle Scholar

  • Margulies, M., M. Egholm, W. E. Altman, S. Attiya, J. S. Bader, L. A. Bemben, J. Berka, M. S. Braverman, Y. J. Chen, Z. Chen, S. B. Dewell, L. Du, J. M. Fierro, X. V. Gomes, B. C. Godwin, W. He, S. Helgesen, C. H. Ho, G. P. Irzyk, S. C. Jando, M. L. Alenquer, T. P. Jarvie, K. B. Jirage, J. B. Kim, J. R. Knight, J. R. Lanza, J. H. Leamon, S. M. Lefkowitz, M. Lei, J. Li, K. L. Lohman, H. Lu, V. B. Makhijani, K. E. McDade, M. P. McKenna, E. W. Myers, E. Nickerson, J. R. Nobile, R. Plant, B. P. Puc, M. T. Ronan, G. T. Roth, G. J. Sarkis, J. F. Simons, J. W. Simpson, M. Srinivasan, K. R. Tartaro, A. Tomasz, K. A. Vogt, G. A. Volkmer, S. H. Wang, Y. Wang, M. P. Weiner, P. Yu, R. F. Begley and J. M. Rothberg (2005): “Genome sequencing in microfabricated high-density picolitre reactors,” Nature, 437, 376–380.PubMedCrossrefGoogle Scholar

  • Marioni, J. C., C. E. Mason, S. M. Mane, M. Stephens and Y. Gilad (2008): “RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays,” Genome Res., 18, 1509–1517.PubMedWeb of ScienceCrossrefGoogle Scholar

  • Miller, N. A., S. F. Kingsmore, A. Farmer, R. J. Langley, J. Mudge, J. A. Crow, A. J. Gonzalez, F. D. Schilkey, R. J. Kim, J. van Velkinburgh, G. D. May, C. F. Black, M. K. Myers, J. P. Utsey, N. S. Frost, D. J. Sugarbaker, R. Bueno, S. R. Gullans, S. M. Baxter, S. W. Day and E. F. Retzel (2008): “Management of high-throughput DNA sequencing projects: Alpheus,” J. Comput. Sci. Syst. Biol., 1, 132.PubMedGoogle Scholar

  • Mortazavi, A., B. A. Williams, K. McCue, L. Schaeffer and B. Wold (2008): “Mapping and quantifying mammalian transcriptomes by RNA-Seq,” Nat. Methods, 5, 621–628.CrossrefPubMedWeb of ScienceGoogle Scholar

  • Nagalakshmi, U., Z. Wang, K. Waern, C. Shou, D. Raha, M. Gerstein and M. Snyder (2008): “The transcriptional landscape of the yeast genome defined by RNA sequencing,” Science, 320, 1344–1349.PubMedCrossrefWeb of ScienceGoogle Scholar

  • Orr, M., P. Liu and D. Nettleton (2014): “An improved method for computing q-values when the distribution of effect sizes is asymmetric,” Bioinformatics, 30, 3044–3053.CrossrefWeb of SciencePubMedGoogle Scholar

  • Robinson, M. D., D. J. McCarthy and G. K. Smyth (2010): “edgeR: a Bioconductor package for differential expression analysis of digital gene expression data,” Bioinformatics, 26, 139–140.Web of SciencePubMedCrossrefGoogle Scholar

  • Spellman, P. T., G. Sherlock, M. Q. Zhang, V. R. Iyer, K. Anders, M. B. Eisen, P. O. Brown, D. Botstein and B. Futcher (1998): “Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization,” Mol. Biol. Cell, 9, 3273–3297.CrossrefPubMedGoogle Scholar

  • Storey, J. D. (2002): “A direct approach to false discovery rates,” J. R. Stat. Soc. Ser. B (Stat. Methodol.), 64, 479–498.CrossrefGoogle Scholar

  • Storey, J. D. and R. Tibshirani (2003): “Statistical significance for genomewide studies,” Proc. Natl. Acad. Sci., 100, 9440–9445.CrossrefGoogle Scholar

  • Tusher, V. G., R. Tibshirani and G. Chu (2001): “Significance analysis of microarrays applied to the ionizing radiation response,” Proc. Natl. Acad. Sci. USA, 98, 5116–5121.CrossrefGoogle Scholar

  • Wang, Z., M. Gerstein and M. Snyder (2009): “RNA-Seq: a revolutionary tool for transcriptomics,” Nat. Rev. Genet., 10, 57–63.Web of ScienceCrossrefPubMedGoogle Scholar

  • Wilhelm, B. T. and J. R. Landry (2009): “RNA-Seq-quantitative measurement of expression through massively parallel RNA-sequencing,” Methods, 48, 249–257.CrossrefPubMedWeb of ScienceGoogle Scholar

About the article

Published Online: 2017-10-27

Published in Print: 2017-11-27

Citation Information: Statistical Applications in Genetics and Molecular Biology, Volume 16, Issue 5-6, Pages 333–347, ISSN (Online) 1544-6115, ISSN (Print) 2194-6302, DOI: https://doi.org/10.1515/sagmb-2016-0037.

Export Citation

©2017 Walter de Gruyter GmbH, Berlin/Boston.Get Permission

Comments (0)

Please log in or register to comment.
Log in