Skip to content
Licensed Unlicensed Requires Authentication Published by De Gruyter December 10, 2015

Empirical likelihood tests for nonparametric detection of differential expression from RNA-seq data

Thomas Thorne ORCID logo EMAIL logo

Abstract

The availability of large quantities of transcriptomic data in the form of RNA-seq count data has necessitated the development of methods to identify genes differentially expressed between experimental conditions. Many existing approaches apply a parametric model of gene expression and so place strong assumptions on the distribution of the data. Here we explore an alternate nonparametric approach that applies an empirical likelihood framework, allowing us to define likelihoods without specifying a parametric model of the data. We demonstrate the performance of our method when applied to gold standard datasets, and to existing experimental data. Our approach outperforms or closely matches performance of existing methods in the literature, and requires modest computational resources. An R package, EmpDiff implementing the methods described in the paper is available from: http://homepages.inf.ed.ac.uk/tthorne/software/packages/EmpDiff_0.99.tar.gz.


Corresponding author: Thomas Thorne, School of Informatics, University of Edinburgh, EH8 9AB, UK, e-mail: .

Acknowledgments

This work was supported by the University of Edinburgh Chancellor’s Fellowship to T.T.

References

Anders, S. and W. Huber (2010): “Differential expression analysis for sequence count data,” Genome Biol., 11, R106.Search in Google Scholar

Baggerly, K. A. (1998): “Empirical likelihood as a goodness-of-fit measure,” Biometrika, 85, 535–547.10.1093/biomet/85.3.535Search in Google Scholar

Barrett, T., S. E. Wilhite, P. Ledoux, C. Evangelista, I. F. Kim, M. Tomashevsky, K. A. Marshall, K. H. Phillippy, P. M. Sherman, M. Holko, A. Yefanov, H. Lee, N. Zhang, C. L. Robertson, N. Serova, S. Davis and A. Soboleva (2013): “NCBI GEO: archive for functional genomics data sets– update,” Nucleic Acids Res., 41, D991–5.Search in Google Scholar

Bartolucci, F. (2007): “A penalized version of the empirical likelihood ratio for the population mean,” Stat. Probabil. Lett., 77, 104–110.Search in Google Scholar

Benidt, S. and D. Nettleton (2015): “SimSeq: a nonparametric approach to simulation of RNA-sequence datasets,” Bioinformatics, 31, 2131–2140.10.1093/bioinformatics/btv124Search in Google Scholar PubMed PubMed Central

Bullard, J. H., E. Purdom, K. D. Hansen and S. Dudoit (2010): “Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments,” BMC Bioinformatics, 11, 94.10.1186/1471-2105-11-94Search in Google Scholar PubMed PubMed Central

Canales, R. D., Y. Luo, J. C. Willey, B. Austermiller, C. C. Barbacioru, C. Boysen, K. Hunkapiller, R. V. Jensen, C. R. Knight, K. Y. Lee, Y. Ma, B. Maqsodi, A. Papallo, E. H. Peters, K. Poulter, P. L. Ruppel, R. R. Samaha, L. Shi, W. Yang, L. Zhang and F. M. Goodsaid (2006): “Evaluation of DNA microarray results with quantitative gene expression platforms,” Nat. Biotechnol., 24, 1115–1122.Search in Google Scholar

Dere, E., R. Lo, T. Celius, J. Matthews and T. R. Zacharewski (2011): “Integration of Genome-Wide Computation DRE Search, AhR ChIP-chip and Gene Expression Analyses of TCDD-Elicited Responses in the Mouse Liver,” BMC Genomics, 12, 365.10.1186/1471-2164-12-365Search in Google Scholar PubMed PubMed Central

Edgar, R., M. Domrachev and A. E. Lash (2002): “Gene Expression Omnibus: NCBI gene expression and hybridization array data repository,” Nucleic Acids Res., 30, 207–210.Search in Google Scholar

Frazee, A. C., B. Langmead and J. T. Leek (2011): “ReCount: A multi-experiment resource of analysis-ready RNA-seq gene count datasets,” BMC Bioinformatics, 12, 449.10.1186/1471-2105-12-449Search in Google Scholar PubMed PubMed Central

Grau, J., I. Grosse and J. Keilwagen (2015): “PRROC: computing and visualizing precision-recall and receiver operating characteristic curves in R,” Bioinformatics, 31, 2595–2597.10.1093/bioinformatics/btv153Search in Google Scholar PubMed PubMed Central

Hardcastle, T. J. and K. A. Kelly (2010): “baySeq: Empirical Bayesian methods for identifying differential expression in sequence count data,” BMC Bioinformatics, 11, 422.10.1186/1471-2105-11-422Search in Google Scholar PubMed PubMed Central

Kanehisa, M. and S. Goto (2000): “KEGG: kyoto encyclopedia of genes and genomes,” Nucleic Acids Res., 28, 27–30.Search in Google Scholar

Kanehisa, M., S. Goto, S. Kawashima and A. Nakaya (2002): “The KEGG databases at GenomeNet,” Nucleic Acids Res., 30, 42–46.Search in Google Scholar

Leng, N., J. A. Dawson, J. A. Thomson, V. Ruotti, A. I. Rissman, B. M. G. Smits, J. D. Haag, M. N. Gould, R. M. Stewart and C. Kendziorski (2013): “EBSeq: an empirical Bayes hierarchical model for inference in RNA-seq experiments,” Bioinformatics, 29, 1035–1043.10.1093/bioinformatics/btt087Search in Google Scholar PubMed PubMed Central

Li, J. and R. Tibshirani (2013): “Finding consistent patterns: a nonparametric approach for identifying differential expression in RNA-Seq data,” Stat. Methods Med. Res., 22, 519–536.Search in Google Scholar

Lo, R. and J. Matthews (2012): “High-resolution genome-wide mapping of AHR and ARNT binding sites by ChIP-Seq,” Toxicol. Sci., 130, 349–361.Search in Google Scholar

Lo, R. and J. Matthews (2013): “The aryl hydrocarbon receptor and estrogen receptor alpha differentially modulate nuclear factor erythroid-2-related factor 2 transactivation in MCF-7 breast cancer cells,” Toxicol. Appl. Pharm., 270, 139–148.10.1016/j.taap.2013.03.029Search in Google Scholar PubMed

Love, M. I., W. Huber and S. Anders (2014): “Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2,” Genome Biol., 15, 550.Search in Google Scholar

MAQC Consortium, L. Shi, L. H. Reid, W. D. Jones, R. Shippy, J. A. Warrington, S. C. Baker, P. J. Collins, F. de Longueville, E. S. Kawasaki, K. Y. Lee, Y. Luo, Y. A. Sun, J. C. Willey, R. A. Setterquist, G. M. Fischer, W. Tong, Y. P. Dragan, D. J. Dix, F. W. Frueh, F. M. Goodsaid, D. Herman, R. V. Jensen, C. D. Johnson, E. K. Lobenhofer, R. K. Puri, U. Schrf, J. Thierry-Mieg, C. Wang, M. Wilson, P. K. Wolber, L. Zhang, S. Amur, W. Bao, C. C. Barbacioru, A. B. Lucas, V. Bertholet, C. Boysen, B. Bromley, D. Brown, A. Brunner, R. Canales, X. M. Cao, T. A. Cebula, J. J. Chen, J. Cheng, T.-M. Chu, E. Chudin, J. Corson, J. C. Corton, L. J. Croner, C. Davies, T. S. Davison, G. Delenstarr, X. Deng, D. Dorris, A. C. Eklund, X.-h. Fan, H. Fang, S. Fulmer-Smentek, J. C. Fuscoe, K. Gallagher, W. Ge, L. Guo, X. Guo, J. Hager, P. K. Haje, J. Han, T. Han, H. C. Harbottle, S. C. Harris, E. Hatchwell, C. A. Hauser, S. Hester, H. Hong, P. Hurban, S. A. Jackson, H. Ji, C. R. Knight, W. P. Kuo, J. E. LeClerc, S. Levy, Q.-Z. Li, C. Liu, Y. Liu, M. J. Lombardi, Y. Ma, S. R. Magnuson, B. Maqsodi, T. McDaniel, N. Mei, O. Myklebost, B. Ning, N. Novoradovskaya, M. S. Orr, T. W. Os-born, A. Papallo, T. A. Patterson, R. G. Perkins, E. H. Peters, R. Peterson, K. L. Philips, P. S. Pine, L. Pusztai, F. Qian, H. Ren, M. Rosen, B. A. Rosenzweig, R. R. Samaha, M. Schena, G. P. Schroth, S. Shchegrova, D. D. Smith, F. Staedtler, Z. Su, H. Sun, Z. Szallasi, Z. Tezak, D. Thierry-Mieg, K. L. Thompson, I. Tikhonova, Y. Turpaz, B. Vallanat, C. Van, S. J. Walker, S. J. Wang, Y. Wang, R. Wolfinger, A. Wong, J. Wu, C. Xiao, Q. Xie, J. Xu, W. Yang, L. Zhang, S. Zhong, Y. Zong and W. Slikker (2006): “The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements,” Nat. Biotechnol., 24, 1151–1161.Search in Google Scholar

Ogata, H., S. Goto, K. Sato, W. Fujibuchi, H. Bono and M. Kanehisa (1999): “KEGG: Kyoto Encyclopedia of Genes and Genomes,” Nucleic Acids Res., 27, 29–34.Search in Google Scholar

Owen, A. B. (1988): “Empirical likelihood ratio confidence intervals for a single functional,” Biometrika, 75, 237–249.10.1093/biomet/75.2.237Search in Google Scholar

Owen, A. B. (2001): Empirical Likelihood, CRC Press, Boca Raton, FL.Search in Google Scholar

Pawitan, Y. (2001): In All likelihood: statistical modelling and inference using likelihood, Oxford University Press, Oxford.Search in Google Scholar

R Core Team (2015): R: A language and environment for statistical computing, R Foundation for statistical computing, Vienna, Austria.Search in Google Scholar

Reimand, J., M. Kull, H. Peterson, J. Hansen and J. Vilo (2007): “g:Profiler–a web-based toolset for functional profiling of gene lists from large-scale experiments,” Nucleic Acids Res., 35, W193–200.Search in Google Scholar

Reimand, J., T. Arak and J. Vilo (2011): “g:Profiler – a web server for functional interpretation of gene lists (2011 update),” Nucleic Acids Res., 39, W307–W315.Search in Google Scholar

Robinson, M. D., D. J. McCarthy and G. K. Smyth (2010): “edgeR: a Bioconductor package for differential expression analysis of digital gene expression data,” Bioinformatics, 26, 139–140.10.1093/bioinformatics/btp616Search in Google Scholar PubMed PubMed Central

Salisbury, T. B., J. K. Tomblin, D. A. Primerano, G. Boskovic, J. Fan, I. Mehmi, J. Fletcher, N. Santanam, E. Hurn, G. Z. Morris and J. Denvir (2014): “Endogenous aryl hydrocarbon receptor promotes basal and inducible expression of tumor necrosis factor target genes in MCF-7 cancer cells,” Biochem. Pharmacol., 91, 390–399.10.1016/j.bcp.2014.06.015Search in Google Scholar PubMed PubMed Central

Tarazona, S., F. Garcí-Alcalde, J. Dopazo, A. Ferrer and A. Conesa (2011): “Differential expression in RNA-seq: a matter of depth,” Genome Res., 21, 2213–2223.Search in Google Scholar

The Cancer Genome Atlas Research Network (2013): “Comprehensive molecular characterization of clear cell renal cell carcinoma,” Nature, 499, 43–49.10.1038/nature12222Search in Google Scholar PubMed PubMed Central

Yang, X., S. Solomon, L. R. Fraser, A. F. Trombino, D. Liu, G. E. Sonenshein, E. V. Hestermann and D. H. Sherr (2008): “Constitutive regulation ofCYP1B1 by the aryl hydrocarbon receptor (AhR) in pre-malignant and malignant mammary tissue,” J. Cell. Biochem., 104, 402–417.Search in Google Scholar


Supplemental Material:

The online version of this article (DOI: 10.1515/sagmb-2015-0095) offers supplementary material, available to authorized users.


Published Online: 2015-12-10
Published in Print: 2015-12-1

©2015 by De Gruyter

Downloaded on 28.11.2022 from frontend.live.degruyter.dgbricks.com/document/doi/10.1515/sagmb-2015-0095/html
Scroll Up Arrow