DNA methylation plays an important role in human health and disease, and methods for the identification of differently methylated regions are of increasing interest. There is currently a lack of statistical methods which properly address multiple testing, i.e. control genome-wide significance for differentially methylated regions. We introduce a scan statistic (DMRScan), which overcomes these limitations. We benchmark DMRScan against two well established methods (bumphunter, DMRcate), using a simulation study based on real methylation data. An implementation of DMRScan is available from Bioconductor. Our method has higher power than alternative methods across different simulation scenarios, particularly for small effect sizes. DMRScan exhibits greater flexibility in statistical modeling and can be used with more complex designs than current methods. DMRScan is the first dynamic approach which properly addresses the multiple-testing challenges for the identification of differently methylated regions. DMRScan outperformed alternative methods in terms of power, while keeping the false discovery rate controlled.
Funding source: University of Oslo
Award Identifier / Grant number: 531217/1231
Funding source: Folkhälsan Research Foundation; The Academy of Finland
Award Identifier / Grant number: 250704
Funding statement: This work was supported by the University of Oslo [Funder Id: 10.13039/501100005366, grant number 531217/1231]; Folkhälsan Research Foundation; The Academy of Finland [grant number 250704]; The Life and Health Medical Fund [grant number 1-23-28]; The Swedish Cultural Foundation in Finland [grant number 15/0897]; The Signe and Ane Gyllenberg Foundation [grant number 37-1977-43]; and The Yrjö Jahnsson Foundation [grant number 11486].
We acknowledge Folkhälsan Research Center and the Fin-HIT study group: Sabina Simola, Stephanie Von Kreamer, Jesper Skand, Catharina Sarkkola, Sajan Raju and Elisabete Weiderpass (Helsinki, Finland) for providing data for benchmarking the different models. Institute for Molecular Medicine Finland (FIMM) provided computational infrastructure and preformed the sequencing to this project. Suzanne Campbell and Marissa LaBlanc for critical evaluation of this manuscript.
List of abbreviations
Autoregressive process of order p
Differentially methylated region
Expected number of significant windows of size k
False discovery rate
Markov Chain Monte Carlo
Window threshold for sliding windows of size k
Ethics: The Coordinating Ethics Committees of the Hospital Districts of Helsinki and Uusimaa approved the study. Informed consent was obtained from all participants and as well as one of their legal guardians.
Availability of data and materials: The R package is placed at Bioconductor under the name DMRScan, along with the example data set used in this paper. The R-code for comparing the methods can be found in the GitHub repos for the of the R package: https://github.com/christpa/DMRScan.
Conflict of interest statement: The authors declare that they have no competing interests.
Aldous, D. (1989): Probability approximations via the Poisson clumping heuristic, Springer Science & Business Media.Search in Google Scholar
Benjamini, Y., J. Taylor and R. A. Irizarry. (2018): “Selection corrected statistical inference for region detection with high-throughput assays.” J. Am. Stat. Assoc., 1(47).Search in Google Scholar
Butcher, L. M. and S. Beck (2015): “Probe Lasso: a novel method to rope in differentially methylated regions with 450K DNA methylation data,” Methods, 72, 21–28.10.1016/j.ymeth.2014.10.036Search in Google Scholar
Du, P., X. Zhang, C. C. Huang, N. Jafari, W. A. Kibbe, L. Hou and S. M. Lin (2010): “Comparison of Beta-value and M-value methods for quantifying methylation levels by microarray analysis,” BMC Bioinformatics, 11, 587.10.1186/1471-2105-11-58721118553Search in Google Scholar
Feinberg, A. P., R. A. Irizarry, D. Fradin, M. J. Aryee, P. Murakami, T. Aspelund, G. Eiriksdottir, T. B. Harris, L. Launer, V. Gudnason and M. D. Fallin (2010): “Personalized epigenomic signatures that are stable over time and covary with body mass index,” Sci. Transl. Med., 2, 49ra67.20844285Search in Google Scholar
Hansen, K. D., B. Langmead and R. A. Irizarry (2012): “BSmooth: from whole genome bisulfite sequencing reads to differentially methylated regions,” Genome Biol., 13, R83.2303417510.1186/gb-2012-13-10-r83Search in Google Scholar
Jaffe, A. E., P. Murakami, H. Lee, J. T. Leek, M. D. Fallin, A. P. Feinberg and R. A. Irizarry (2012): “Bump hunting to identify differentially methylated regions in epigenetic epidemiology studies,” Int. J. Epidemiol., 41, 200–209.2242245310.1093/ije/dyr238Search in Google Scholar
Korthauer, K., S. Chakraborty, Y. Benjamini amd R. A. Irizarry. (2017): “Detection and accurate False Discovery Rate control of differentially methylated regions from Whole Genome Bisulfite Sequencing.” Biostatistics.Search in Google Scholar
Laurent, L., E. Wong, G. Li, T. Huynh, A. Tsirigos, C. T. Ong, H. M. Low, K. W. Kin Sung, I. Rigoutsos, J. Loring and C. L. Wei (2010): “Dynamic changes in the human methylome during differentiation,” Genome Res., 20, 320–331.2013333310.1101/gr.101907.109Search in Google Scholar
Lun, A. T. and G. K. Smyth (2015): “csaw: a Bioconductor package for differential binding analysis of ChIP-seq data using sliding windows,” Nucleic Acids Res., 44, e45–e45.Search in Google Scholar
Peng, G., L. Luo, H. Siu, Y. Zhu, P. Hu, S. Hong, J. Zhao, X. Zhou, J. D. Reveille and L. Jin (2010): “Gene and pathway-based second-wave analysis of genome-wide association studies,” Eur. J. Hum. Genet., 18, 111–117.1958489910.1038/ejhg.2009.115Search in Google Scholar
Peters, T. J., M. J. Buckley, A. L. Statham, R. Pidsley, K. Samaras, V. L. R, S. J. Clark and P. L. Molloy (2015): “De novo identification of differentially methylated regions in the human genome,” Epigenetics Chromatin, 8, 6.25972926Search in Google Scholar
Rakyan, V. K., T. A. Down, D. J. Balding and S. Beck (2011): “Epigenome-wide association studies for common human diseases,” Nat. Rev. Genet., 12, 529–541.2174740410.1038/nrg3000Search in Google Scholar
Reiner-Benaim, A., R. W. Davis and K. Juneau (2014): “Scan statistics analysis for detection of introns in time-course tiling array data,” Stat. Appl. Genet. Mol. Biol., 13, 173–190.24572987Search in Google Scholar
Ritchie, M. E., B. Phipson, D. Wu, Y. Hu, C. W. Law, W. Shi amd G. K. Smyth. (2015): “Limma powers differential expression analyses for RNA-sequencing and microarray studies,” Nucleic Acids Res., 43, e47–e60.Search in Google Scholar
Rounge, T. B., C. M. Page, M. Lepisto, P. Ellonen, B. K. Andreassen and E. Weiderpass (2016): “Genome-wide DNA methylation in saliva and body size of adolescent girls,” Epigenomics, 8, 1495–1505.10.2217/epi-2016-004527762626Search in Google Scholar
Rozowsky, J., G. Euskirchen, R. K. Auerbach, Z. D. Zhang, T. Gibson, R. Bjornson, N. Carriero, M. Snyder and M. B. Gerstein (2009): “PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls,” Nat. Biotechnol., 27, 66–75.10.1038/nbt.151819122651Search in Google Scholar
Shen, L., J. Zhu, S.-Y. Robert Li and X. Fan (2017): “Detect differentially methylated regions using non-homogeneous hidden Markov model for methylation array data,” Bioinformatics, 33, 3701–3708.2903632010.1093/bioinformatics/btx467Search in Google Scholar
Siegmund, D. (1985): Sequential analysis: tests and confidence intervals. NY, USA, Springer Science & Business Media.Search in Google Scholar
Siegmund, D. and B. Yakir (2007): The statistics of gene mapping. NY, USA, Springer Science & Business Media.Search in Google Scholar
Slieker, R. C., S. D. Bos, J. J. Goeman, J. V. Bovee, R. P. Talens, R. van der Breggen, H. E. Suchiman, E. W. Lameijer, H. Putter, E. B. van den Akker, Y. Zhang, J. W. Jukema, P. E. Slagboom, I. Meulenbelt and B. T. Heijmans (2013): “Identification and systematic annotation of tissue-specific differentially methylated regions using the Illumina 450k array,” Epigenetics Chromatin, 6, 26.10.1186/1756-8935-6-2623919675Search in Google Scholar
Stouffer, S. A., E. A. Suchman, L. C. DeVinney, S. A. Star and R. M. Williams (1949): The American soldier: Adjustment during army life. (Studies in social psychology in World War II).Search in Google Scholar
Sun, Y. V., A. M. Levin, E. Boerwinkle, H. Robertson and S. L. Kardia (2006): “A scan statistic for identifying chromosomal patterns of SNP association,” Genet. Epidemiol., 30, 627–635.1685869810.1002/gepi.20173Search in Google Scholar
The online version of this article offers supplementary material (DOI: https://doi.org/10.1515/sagmb-2017-0050).
©2018 Walter de Gruyter GmbH, Berlin/Boston