Abstract
The use of fold-change (FC) to prioritize differentially expressed genes (DEGs) for post-hoc characterization is a common technique in the analysis of RNA sequencing datasets. However, the use of FC can overlook certain population of DEGs, such as high copy number transcripts which undergo metabolically expensive changes in expression yet fail to exceed the ratiometric FC cut-off, thereby missing potential important biological information. Here we evaluate an alternative approach to prioritizing RNAseq data based on absolute changes in normalized transcript counts (ΔT) between control and treatment conditions. In five pairwise comparisons with a wide range of effect sizes, rank-ordering of DEGs based on the magnitude of ΔT produced a power curve-like distribution, in which 4.7–5.0% of transcripts were responsible for 36–50% of the cumulative change. Thus, differential gene expression is characterized by the high production-cost expression of a small number of genes (large ΔT genes), while the differential expression of the majority of genes involves a much smaller metabolic investment by the cell. To determine whether the large ΔT datasets are representative of coordinated changes in the transcriptional program, we evaluated large ΔT genes for enrichment of gene ontologies (GOs) and predicted protein interactions. In comparison to randomly selected DEGs, the large ΔT transcripts were significantly enriched for both GOs and predicted protein interactions. Furthermore, enrichments were were consistent with the biological context of each comparison yet distinct from those produced using equal-sized populations of large FC genes, indicating that the large ΔT genes represent an orthagonal transcriptional response. Finally, the composition of the large ΔT gene sets were unique to each pairwise comparison, indicating that they represent coherent and context-specific responses to biological conditions rather than the non-specific upregulation of a family of genes. These findings suggest that the large ΔT genes are not a product of random or stochastic phenomenon, but rather represent biologically meaningful changes in the transcriptional program. They furthermore imply that high abundance transcripts are associated with particularly cellular states, and as cells change in response to internal or external conditions, the relative distribution of the abundant transcripts changes accordingly. Thus, prioritization of DEGs based on the concept of metabolic cost is a simple yet powerful method to identify biologically important transcriptional changes and provide novel insights into cellular behaviors.
Acknowledgments
This work was funded by the National Institutes of Health National Institute of Allergy and Infectious Diseases (IAA number AOD12058-0001-0000) and the Defense Threat Reduction Agency – Joint Science and Technology Office, Medical S&T Division (grant number CBM.THRTOX.01.10.RC.021). This research was performed while I.G. and P.B. held Defense Threat Reduction Agency-National Research Council Research Associateship Awards and K.H. held a National Research Council Research Associateship Award. We thank Angela Adkins, Megan Lyman, and Marian Nelson (USAMRICD, MD) for technical assistance and Cindy Kronman and Riannon Hazell (USAMRICD, MD) for editorial and administrative assistance. The views expressed in this article are those of the authors and do not reflect the official policy of the Department of Army, Department of Defense, or the U.S. Government.
References
Alstott, J., E. Bullmore and D. Plenz (2014): “Powerlaw: A python package for analysis of heavy-tailed distributions,” PLoS ONE 9(1): e85777.10.1371/journal.pone.0085777Search in Google Scholar PubMed PubMed Central
Anders, S. and W. Huber (2010): “Differential expression analysis for sequence count data,” Genome Biol., 11(10), R106.Search in Google Scholar
Anders, S., A. Reyes and W. Huber (2012): “Detecting differential usage of exons from RNA-seq data,” Genome Res., 22(10), 2008–2017.Search in Google Scholar
Ardizzone, T. D., A. Lu, K. R. Wagner, Y. Tang, R. Ran and F. R. Sharp (2004): “Glutamate receptor blockade attenuates glucose hypermetabolism in perihematomal brain after experimental intracerebral hemorrhage in rat,” Stroke, 35(11), 2587–2591.10.1161/01.STR.0000143451.14228.ffSearch in Google Scholar PubMed
Baker, M. (2012): “Digital PCR hits its stride,” Nat. Methods, 9, 541–544.Search in Google Scholar
Benjamini, Y. and Y. Hochberg (1995): “Controlling the false discovery rate: a practical and powerful approach to multiple testing,” J. Roy. Stat. Soc. B (Met.) 57(1), 289–300.10.1111/j.2517-6161.1995.tb02031.xSearch in Google Scholar
Bergmann, S., J. Ihmels and N. Barkai (2004): “Similarities and differences in genome-wide expression data of six organisms,” PLoS Biol., 2(1), E9.10.1371/journal.pbio.0020009Search in Google Scholar PubMed PubMed Central
Bi, Y. and R. V. Davuluri (2013): “NPEBseq: nonparametric empirical bayesian-based procedure for differential expression analysis of RNA-seq data,” BMC Bioinformatics, 14, 262.10.1186/1471-2105-14-262Search in Google Scholar PubMed PubMed Central
Bindea, G., B. Mlecnik, H. Hackl, P. Charoentong, M. Tosolini, A. Kirilovsky, W. H. Fridman, F. Pages, Z. Trajanoski and J. Galon (2009): “ClueGO: a Cytoscape plug-in to decipher functionally grouped gene ontology and pathway annotation networks,” Bioinformatics, 25(8), 1091–1093.10.1093/bioinformatics/btp101Search in Google Scholar PubMed PubMed Central
Blaybel, R., O. Theoleyre, A. Douablin and F. Baklouti (2008): “Downregulation of the Spi-1/PU.1 oncogene induces the expression of TRIM10/HERF1, a key factor required for terminal erythroid cell differentiation and survival,” Cell Res, 18(8), 834–845.Search in Google Scholar
Bullard, J. H., E. Purdom, K. D. Hansen and S. Dudoit (2010): “Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments,” BMC Bioinformatics, 11, 94.10.1186/1471-2105-11-94Search in Google Scholar PubMed PubMed Central
Cline, M. S., M. Smoot, E. Cerami, A. Kuchinsky, N. Landys, C. Workman, R. Christmas, I. Avila-Campilo, M. Creech, B. Gross, K. Hanspers, R. Isserlin, R. Kelley, S. Killcoyne, S. Lotia, S. Maere, J. Morris, K. Ono, V. Pavlovic, A. R. Pico, A. Vailaya, P. L. Wang, A. Adler, B. R. Conklin, L. Hood, M. Kuiper, C. Sander, I. Schmulevich, B. Schwikowski, G. J. Warner, T. Ideker and G. D. Bader (2007): “Integration of biological networks and gene expression data using Cytoscape,” Nat. Protoc., 2(10), 2366–2382.10.1038/nprot.2007.324Search in Google Scholar PubMed PubMed Central
Coffield, J. A. and X. Yan (2009): “Neuritogenic actions of botulinum neurotoxin A on cultured motor neurons,” J. Pharmacol. Exp. Ther., 330(1), 352–358.10.1124/jpet.108.147744Search in Google Scholar PubMed PubMed Central
de Paiva, A., F. A. Meunier, J. Molgo, K. R. Aoki and J. O. Dolly (1999): “Functional repair of motor endplates after botulinum neurotoxin type A poisoning: biphasic switch of synaptic activity between nerve sprouts and their parent terminals,” Proc. Natl. Acad. Sci. USA, 96(6), 3200–3205.10.1073/pnas.96.6.3200Search in Google Scholar PubMed PubMed Central
Dillies, M. A., A. Rau, J. Aubert, C. Hennequet-Antier, M. Jeanmougin, N. Servant, C. Keime, G. Marot, D. Castel, J. Estelle, G. Guernec, B. Jagla, L. Jouneau, D. Laloe, C. Le Gall, B. Schaeffer, S. Le Crom, M. Guedj, F. Jaffrezic and C. French StatOmique (2013): “A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis,” Brief Bioinform., 14(6), 671–683.Search in Google Scholar
Endersby, R., I. J. Majewski, L. Winteringham, J. G. Beaumont, A. Samuels, R. Scaife, E. Lim, M. Crossley, S. P. Klinken and J. P. Lalonde (2008): “Hls5 regulated erythroid differentiation by modulating GATA-1 activity,” Blood, 111(4), 1946–1950.10.1182/blood-2007-04-085746Search in Google Scholar PubMed
Furusawa, C. and K. Kaneko (2003): “Zipf’s law in gene expression,” Phys. Rev. Lett., 90(8), 088102.Search in Google Scholar
Greenbaum, D., C. Colangelo, K. Williams and M. Gerstein (2003): “Comparing protein abundance and mRNA expression levels on a genomic scale,” Genome Biol., 4(9), 117.Search in Google Scholar
Guo, Y., P. Xiao, S. Lei, F. Deng, G. G. Xiao, Y. Liu, X. Chen, L. Li, S. Wu, Y. Chen, H. Jiang, L. Tan, J. Xie, X. Zhu, S. Liang and H. Deng (2008): “How is mRNA expression predictive for protein expression? A correlation study on human circulating monocytes,” Acta Biochim. Biophys. Sin (Shanghai), 40(5), 426–436.10.1111/j.1745-7270.2008.00418.xSearch in Google Scholar PubMed
Gut, I. M., P. H. Beske, K. S. Hubbard, M. E. Lyman, T. A. Hamilton and P. M. McNutt (2013): “Novel application of stem cell-derived neurons to evaluate the time- and dose-dependent progression of excitotoxic injury,” PLoS One, 8(5), e64423.10.1371/journal.pone.0064423Search in Google Scholar PubMed PubMed Central
Huang da, W., B. T. Sherman, Q. Tan, J. R. Collins, W. G. Alvord, J. Roayaei, R. Stephens, M. W. Baseler, H. C. Lane and R. A. Lempicki (2007): “The DAVID gene functional classification tool: a novel biological module-centric algorithm to functionally analyze large gene lists,” Genome Biol., 8(9), R183.Search in Google Scholar
Huang da, W., B. T. Sherman and R. A. Lempicki (2009a): “Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists,” Nucleic Acids Res., 37(1), 1–13.10.1093/nar/gkn923Search in Google Scholar PubMed PubMed Central
Huang, D. W., Sherman, B. T., Zheng, X., Yang, J., Imamichi, T., Stephens, R. and Lempicki, R. A. (2009): “Extracting biological meaning from large gene lists with DAVID,” Current Protocols in Bioinformatics. (27)13.11, 13.11.1–13.11.13.Search in Google Scholar
Hubbard, K. S., I. M. Gut, M. E. Lyman and P. M. McNutt (2013): “Longitudinal RNA sequencing of the deep transcriptome during neurogenesis of glutamatergic neurons from murine ESCs,” F1000 Research, 2(35).10.12688/f1000research.2-35.v1Search in Google Scholar PubMed PubMed Central
Hubbard, K. S., I. M. Gut, M. E. Lyman, K. M. Tuznik, M. T. Mesngon and P. M. McNutt (2012): “High yield derivation of enriched glutamatergic neurons from suspension-cultured mouse ESCs for neurotoxicology research,” BMC Neuroscience, 13(127).10.1186/1471-2202-13-127Search in Google Scholar PubMed PubMed Central
Iyer-Biswas, S., F. Hayot and C. Jayaprakash (2009): “Stochasticity of gene products from transcriptional pulsing,” Phys. Rev. E Stat. Nonlin. Soft. Matter Phys., 79(3 Pt 1), 031911.Search in Google Scholar
Jensen, L. J., M. Kuhn, M. Stark, S. Chaffron, C. Creevey, J. Muller, T. Doerks, P. Julien, A. Roth, M. Simonovic, P. Bork and C. von Mering (2009): “STRING 8–a global view on proteins and their functional interactions in 630 organisms,” Nucleic Acids Res., 37(Database issue), D412–416.Search in Google Scholar
Jiang, L., F. Schlesinger, C. A. Davis, Y. Zhang, R. Li, M. Salit, T. R. Gingeras and B. Oliver (2011): “Synthetic spike-in standards for RNA-seq experiments,” Genome Res., 21(9), 1543–1551.Search in Google Scholar
Krewski, D., D. Acosta Jr., M. Andersen, H. Anderson, J. C. Bailar, 3rd, K. Boekelheide, R. Brent, G. Charnley, V. G. Cheung, S. Green Jr., K. T. Kelsey, N. I. Kerkvliet, A. A. Li, L. McCray, O. Meyer, R. D. Patterson, W. Pennie, R. A. Scala, G. M. Solomon, M. Stephens, J. Yager and L. Zeise (2010): “Toxicity testing in the 21st century: a vision and a strategy,” J. Toxicol. Environ. Health B Crit. Rev., 13(2–4): 51–138.10.1080/10937404.2010.483176Search in Google Scholar PubMed PubMed Central
Leng, N., J. A. Dawson, J. A. Thomson, V. Ruotti, A. I. Rissman, B. M. Smits, J. D. Haag, M. N. Gould, R. M. Stewart and C. Kendziorski (2013): “EBSeq: an empirical Bayes hierarchical model for inference in RNA-seq experiments,” Bioinformatics, 29(8), 1035–1043.10.1093/bioinformatics/btt087Search in Google Scholar PubMed PubMed Central
Love, M. I., W. Huber and S. Anders (2014): “Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2,” Genome Biol., 15(12), 550.Search in Google Scholar
Maere, S., K. Heymans and M. Kuiper (2005): “BiNGO: a Cytoscape plugin to assess overrepresentation of gene ontology categories in biological networks,” Bioinformatics, 21(16), 3448–3449.10.1093/bioinformatics/bti551Search in Google Scholar PubMed
Marioni, J. C., C. E. Mason, S. M. Mane, M. Stephens and Y. Gilad (2008): “RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays,” Genome Res., 18(9), 1509–1517.Search in Google Scholar
McNutt, P., J. Celver, T. Hamilton and M. Mesngon (2011): “Embryonic stem cell-derived neurons are a novel, highly sensitive tissue culture platform for botulinum research,” Biochem. Biophys. Res. Commun., 405(1), 85–90.10.1016/j.bbrc.2010.12.132Search in Google Scholar PubMed
Montroll, E. W. and M. F. Shlesinger (1982): “On 1/f noise and other distributions with long tails,” Proc. Natl. Acad. Sci. USA, 79(10), 3380–3383.10.1073/pnas.79.10.3380Search in Google Scholar PubMed PubMed Central
Mutch, D. M., A. Berger, R. Mansourian, A. Rytz and M. A. Roberts (2002): “The limit fold change model: a practical approach for selecting differentially expressed genes from microarray data,” BMC Bioinformatics, 3, 17.10.1186/1471-2105-3-17Search in Google Scholar PubMed PubMed Central
Nagalakshmi, U., Z. Wang, K. Waern, C. Shou, D. Raha, M. Gerstein and M. Snyder (2008): “The transcriptional landscape of the yeast genome defined by RNA sequencing,” Science, 320(5881), 1344–1349.10.1126/science.1158441Search in Google Scholar PubMed PubMed Central
Novelli, A., J. A. Reilly, P. G. Lysko and R. C. Henneberry (1988): “Glutamate becomes neurotoxic via the N-methyl-D-aspartate receptor when intracellular energy levels are reduced,” Brain Res., 451(1–2), 205–212.Search in Google Scholar
Rapaport, F., R. Khanin, Y. Liang, M. Pirun, A. Krek, P. Zumbo, C. E. Mason, N. D. Socci and D. Betel (2013): “Comprehensive evaluation of differential gene expression analysis methods for RNA-seq data,” Genome Biol., 14(9), R95.Search in Google Scholar
Redmond, L. C., C. I. Dumur, K. J. Archer, J. L. Haar and J. A. Lloyd (2008): “Identification of erythroid-enriched gene expression in the mouse embryonic yolk sac using microdissected cells,” Dev. Dyn., 237(2), 436–446.Search in Google Scholar
Robinson, M. D., D. J. McCarthy and G. K. Smyth (2010): “edgeR: a Bioconductor package for differential expression analysis of digital gene expression data,” Bioinformatics, 26(1), 139–140.10.1093/bioinformatics/btp616Search in Google Scholar PubMed PubMed Central
Robinson, M. D. and G. K. Smyth (2007): “Moderated statistical tests for assessing differences in tag abundance,” Bioinformatics, 23(21), 2881–2887.10.1093/bioinformatics/btm453Search in Google Scholar PubMed
Salari, R., D. Wojtowicz, J. Zheng, D. Levens, Y. Pilpel and T. M. Przytycka (2012): “Teasing apart translational and transcriptional components of stochastic variations in eukaryotic gene expression,” PLoS Comput. Biol., 8(8), e1002644.Search in Google Scholar
Schwanhausser, B., D. Busse, N. Li, G. Dittmar, J. Schuchhardt, J. Wolf, W. Chen and M. Selbach (2011): “Global quantification of mammalian gene expression control,” Nature, 473(7347), 337–342.10.1038/nature10098Search in Google Scholar PubMed
Simpson, L. L. (2004): “Identification of the major steps in botulinum toxin action,” Annu. Rev. Pharmacol. Toxicol., 44, 167–193.Search in Google Scholar
Soneson, C. and M. Delorenzi (2013): “A comparison of methods for differential expression analysis of RNA-seq data,” BMC Bioinformatics, 14, 91.10.1186/1471-2105-14-91Search in Google Scholar PubMed PubMed Central
Spandidos, A., X. Wang, H. Wang, S. Dragnev, T. Thurber and B. Seed (2008): “A comprehensive collection of experimentally validated primers for Polymerase Chain Reaction quantitation of murine transcript abundance,” BMC Genomics, 9, 633.10.1186/1471-2164-9-633Search in Google Scholar PubMed PubMed Central
Storey, J. D. (2003): “The positive false discovery rate: A Bayesian interpretation and the q-value,” Ann. Stat., 31(6), 2013–2035.Search in Google Scholar
Sultan, M., M. H. Schulz, H. Richard, A. Magen, A. Klingenhoff, M. Scherf, M. Seifert, T. Borodina, A. Soldatov, D. Parkhomchuk, D. Schmidt, S. O’Keeffe, S. Haas, M. Vingron, H. Lehrach and M. L. Yaspo (2008): “A global view of gene activity and alternative splicing by deep sequencing of the human transcriptome,” Science, 321(5891), 956–960.10.1126/science.1160342Search in Google Scholar PubMed
Tallack, M. R., G. W. Magor, B. Dartigues, L. Sun, S. Huang, J. M. Fittock, S. V. Fry, E. A. Glazov, T. L. Bailey and A. C. Perkins (2012): “Novel roles for KLF1 in erythropoiesis revealed by mRNA-seq,” Genome Res., 22(12), 2385–2398.Search in Google Scholar
Tarazona, S., F. Garcia-Alcalde, J. Dopazo, A. Ferrer and A. Conesa (2011): “Differential expression in RNA-seq: a matter of depth,” Genome Res., 21(12), 2213–2223.Search in Google Scholar
Vogelstein, B. and K. W. Kinzler (1999): “Digital PCR,” Proc. Natl. Acad. Sci. USA, 96(16), 9236–9241.10.1073/pnas.96.16.9236Search in Google Scholar PubMed PubMed Central
Washburn, M. P., A. Koller, G. Oshiro, R. R. Ulaszek, D. Plouffe, C. Deciu, E. Winzeler and J. R. Yates, 3rd (2003): “Protein pathway and complex clustering of correlated mRNA and protein expression analyses in Saccharomyces cerevisiae,” Proc. Natl. Acad. Sci. USA, 100(6), 3107–3112.10.1073/pnas.0634629100Search in Google Scholar PubMed PubMed Central
Supplemental Material
The online version of this article (DOI: 10.1515/sagmb-2014-0018) offers supplementary material, available to authorized users.
©2015 by De Gruyter