Abstract
DNA methylation plays an important role in human health and disease, and methods for the identification of differently methylated regions are of increasing interest. There is currently a lack of statistical methods which properly address multiple testing, i.e. control genome-wide significance for differentially methylated regions. We introduce a scan statistic (DMRScan), which overcomes these limitations. We benchmark DMRScan against two well established methods (bumphunter, DMRcate), using a simulation study based on real methylation data. An implementation of DMRScan is available from Bioconductor. Our method has higher power than alternative methods across different simulation scenarios, particularly for small effect sizes. DMRScan exhibits greater flexibility in statistical modeling and can be used with more complex designs than current methods. DMRScan is the first dynamic approach which properly addresses the multiple-testing challenges for the identification of differently methylated regions. DMRScan outperformed alternative methods in terms of power, while keeping the false discovery rate controlled.
Funding source: University of Oslo
Award Identifier / Grant number: 531217/1231
Funding source: Folkhälsan Research Foundation; The Academy of Finland
Award Identifier / Grant number: 250704
Funding statement: This work was supported by the University of Oslo [Funder Id: 10.13039/501100005366, grant number 531217/1231]; Folkhälsan Research Foundation; The Academy of Finland [grant number 250704]; The Life and Health Medical Fund [grant number 1-23-28]; The Swedish Cultural Foundation in Finland [grant number 15/0897]; The Signe and Ane Gyllenberg Foundation [grant number 37-1977-43]; and The Yrjö Jahnsson Foundation [grant number 11486].
Acknowledgement
We acknowledge Folkhälsan Research Center and the Fin-HIT study group: Sabina Simola, Stephanie Von Kreamer, Jesper Skand, Catharina Sarkkola, Sajan Raju and Elisabete Weiderpass (Helsinki, Finland) for providing data for benchmarking the different models. Institute for Molecular Medicine Finland (FIMM) provided computational infrastructure and preformed the sequencing to this project. Suzanne Campbell and Marissa LaBlanc for critical evaluation of this manuscript.
List of abbreviations
- AR(p)
Autoregressive process of order p
- ChIP
Chromatin Immunoprecipitation
- DMR
Differentially methylated region
- Ek
Expected number of significant windows of size k
- FDR
False discovery rate
- MCMC
Markov Chain Monte Carlo
- OU-process
Ornstein-Uhlenbeck process
- tk
Window threshold for sliding windows of size k
Declarations
Ethics: The Coordinating Ethics Committees of the Hospital Districts of Helsinki and Uusimaa approved the study. Informed consent was obtained from all participants and as well as one of their legal guardians.
Availability of data and materials: The R package is placed at Bioconductor under the name DMRScan, along with the example data set used in this paper. The R-code for comparing the methods can be found in the GitHub repos for the of the R package: https://github.com/christpa/DMRScan.
Conflict of interest statement: The authors declare that they have no competing interests.
References
Aldous, D. (1989): Probability approximations via the Poisson clumping heuristic, Springer Science & Business Media.10.1007/978-1-4757-6283-9Search in Google Scholar
Benjamini, Y., J. Taylor and R. A. Irizarry. (2018): “Selection corrected statistical inference for region detection with high-throughput assays.” J. Am. Stat. Assoc., 1(47).10.1080/01621459.2018.1498347Search in Google Scholar
Bock, C. (2012): “Analysing and interpreting DNA methylation data,” Nat. Rev. Genet., 13, 705–719.10.1038/nrg3273Search in Google Scholar PubMed
Butcher, L. M. and S. Beck (2015): “Probe Lasso: a novel method to rope in differentially methylated regions with 450K DNA methylation data,” Methods, 72, 21–28.10.1016/j.ymeth.2014.10.036Search in Google Scholar PubMed PubMed Central
Du, P., X. Zhang, C. C. Huang, N. Jafari, W. A. Kibbe, L. Hou and S. M. Lin (2010): “Comparison of Beta-value and M-value methods for quantifying methylation levels by microarray analysis,” BMC Bioinformatics, 11, 587.10.1186/1471-2105-11-587Search in Google Scholar PubMed PubMed Central
Feinberg, A. P., R. A. Irizarry, D. Fradin, M. J. Aryee, P. Murakami, T. Aspelund, G. Eiriksdottir, T. B. Harris, L. Launer, V. Gudnason and M. D. Fallin (2010): “Personalized epigenomic signatures that are stable over time and covary with body mass index,” Sci. Transl. Med., 2, 49ra67.10.1126/scitranslmed.3001262Search in Google Scholar PubMed PubMed Central
Hansen, K. D., B. Langmead and R. A. Irizarry (2012): “BSmooth: from whole genome bisulfite sequencing reads to differentially methylated regions,” Genome Biol., 13, R83.10.1186/gb-2012-13-10-r83Search in Google Scholar PubMed PubMed Central
Jaffe, A. E., P. Murakami, H. Lee, J. T. Leek, M. D. Fallin, A. P. Feinberg and R. A. Irizarry (2012): “Bump hunting to identify differentially methylated regions in epigenetic epidemiology studies,” Int. J. Epidemiol., 41, 200–209.10.1093/ije/dyr238Search in Google Scholar PubMed PubMed Central
Jones, P. A. (2012): “Functions of DNA methylation: islands, start sites, gene bodies and beyond,” Nat. Rev. Genet., 13, 484–492.10.1038/nrg3230Search in Google Scholar PubMed
Korthauer, K., S. Chakraborty, Y. Benjamini amd R. A. Irizarry. (2017): “Detection and accurate False Discovery Rate control of differentially methylated regions from Whole Genome Bisulfite Sequencing.” Biostatistics.10.1093/biostatistics/kxy007Search in Google Scholar PubMed PubMed Central
Laurent, L., E. Wong, G. Li, T. Huynh, A. Tsirigos, C. T. Ong, H. M. Low, K. W. Kin Sung, I. Rigoutsos, J. Loring and C. L. Wei (2010): “Dynamic changes in the human methylome during differentiation,” Genome Res., 20, 320–331.10.1101/gr.101907.109Search in Google Scholar PubMed PubMed Central
Lun, A. T. and G. K. Smyth (2015): “csaw: a Bioconductor package for differential binding analysis of ChIP-seq data using sliding windows,” Nucleic Acids Res., 44, e45–e45.10.1093/nar/gkv1191Search in Google Scholar PubMed PubMed Central
Peng, G., L. Luo, H. Siu, Y. Zhu, P. Hu, S. Hong, J. Zhao, X. Zhou, J. D. Reveille and L. Jin (2010): “Gene and pathway-based second-wave analysis of genome-wide association studies,” Eur. J. Hum. Genet., 18, 111–117.10.1038/ejhg.2009.115Search in Google Scholar PubMed PubMed Central
Peters, T. J., M. J. Buckley, A. L. Statham, R. Pidsley, K. Samaras, V. L. R, S. J. Clark and P. L. Molloy (2015): “De novo identification of differentially methylated regions in the human genome,” Epigenetics Chromatin, 8, 6.10.1186/1756-8935-8-6Search in Google Scholar PubMed PubMed Central
Rakyan, V. K., T. A. Down, D. J. Balding and S. Beck (2011): “Epigenome-wide association studies for common human diseases,” Nat. Rev. Genet., 12, 529–541.10.1038/nrg3000Search in Google Scholar PubMed PubMed Central
Reiner-Benaim, A., R. W. Davis and K. Juneau (2014): “Scan statistics analysis for detection of introns in time-course tiling array data,” Stat. Appl. Genet. Mol. Biol., 13, 173–190.10.1515/sagmb-2013-0038Search in Google Scholar PubMed
Ritchie, M. E., B. Phipson, D. Wu, Y. Hu, C. W. Law, W. Shi amd G. K. Smyth. (2015): “Limma powers differential expression analyses for RNA-sequencing and microarray studies,” Nucleic Acids Res., 43, e47–e60.10.1093/nar/gkv007Search in Google Scholar PubMed PubMed Central
Rounge, T. B., C. M. Page, M. Lepisto, P. Ellonen, B. K. Andreassen and E. Weiderpass (2016): “Genome-wide DNA methylation in saliva and body size of adolescent girls,” Epigenomics, 8, 1495–1505.10.2217/epi-2016-0045Search in Google Scholar PubMed
Rozowsky, J., G. Euskirchen, R. K. Auerbach, Z. D. Zhang, T. Gibson, R. Bjornson, N. Carriero, M. Snyder and M. B. Gerstein (2009): “PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls,” Nat. Biotechnol., 27, 66–75.10.1038/nbt.1518Search in Google Scholar PubMed PubMed Central
Satterthwaite, F. E. (1946): “An approximate distribution of estimates of variance components,” Biometrics Bull., 2, 110–114.10.2307/3002019Search in Google Scholar
Shen, L., J. Zhu, S.-Y. Robert Li and X. Fan (2017): “Detect differentially methylated regions using non-homogeneous hidden Markov model for methylation array data,” Bioinformatics, 33, 3701–3708.10.1093/bioinformatics/btx467Search in Google Scholar PubMed PubMed Central
Siegmund, D. (1985): Sequential analysis: tests and confidence intervals. NY, USA, Springer Science & Business Media.10.1007/978-1-4757-1862-1Search in Google Scholar
Siegmund, D. and B. Yakir (2007): The statistics of gene mapping. NY, USA, Springer Science & Business Media.Search in Google Scholar
Siegmund, D. O., N. R. Zhang and B. Yakir (2011): “False discovery rate for scanning statistics,” Biometrika, 98, 979–985.10.1093/biomet/asr057Search in Google Scholar
Slieker, R. C., S. D. Bos, J. J. Goeman, J. V. Bovee, R. P. Talens, R. van der Breggen, H. E. Suchiman, E. W. Lameijer, H. Putter, E. B. van den Akker, Y. Zhang, J. W. Jukema, P. E. Slagboom, I. Meulenbelt and B. T. Heijmans (2013): “Identification and systematic annotation of tissue-specific differentially methylated regions using the Illumina 450k array,” Epigenetics Chromatin, 6, 26.10.1186/1756-8935-6-26Search in Google Scholar PubMed PubMed Central
Stouffer, S. A., E. A. Suchman, L. C. DeVinney, S. A. Star and R. M. Williams (1949): The American soldier: Adjustment during army life. (Studies in social psychology in World War II).Search in Google Scholar
Sun, Y. V., A. M. Levin, E. Boerwinkle, H. Robertson and S. L. Kardia (2006): “A scan statistic for identifying chromosomal patterns of SNP association,” Genet. Epidemiol., 30, 627–635.10.1002/gepi.20173Search in Google Scholar PubMed
Zhang, Y. (2008): “Poisson approximation for significance in genome-wide ChIP-chip tiling arrays,” Bioinformatics, 24, 2825–2831.10.1093/bioinformatics/btn549Search in Google Scholar PubMed
Supplementary Material
The online version of this article offers supplementary material (DOI: https://doi.org/10.1515/sagmb-2017-0050).
©2018 Walter de Gruyter GmbH, Berlin/Boston