Accessible Unlicensed Requires Authentication Published by De Gruyter February 24, 2016

Comparing five statistical methods of differential methylation identification using bisulfite sequencing data

Xiaoqing Yu and Shuying Sun

Abstract

We are presenting a comprehensive comparative analysis of five differential methylation (DM) identification methods: methylKit, BSmooth, BiSeq, HMM-DM, and HMM-Fisher, which are developed for bisulfite sequencing (BS) data. We summarize the features of these methods from several analytical aspects and compare their performances using both simulated and real BS datasets. Our comparison results are summarized below. First, parameter settings may largely affect the accuracy of DM identification. Different from default settings, modified parameter settings yield higher sensitivity and/or lower false positive rates. Second, all five methods show more accurate results when identifying simulated DM regions that are long and have small within-group variation, but they have low concordance, probably due to the different approaches they have used for DM identification. Third, HMM-DM and HMM-Fisher yield relatively higher sensitivity and lower false positive rates than others, especially in DM regions with large variation. Finally, we have found that among the three methods that involve methylation estimation (methylKit, BSmooth, and BiSeq), BiSeq can best present raw methylation signals. Therefore, based on these results, we suggest that users select DM identification methods based on the characteristics of their data and the advantages of each method.


Corresponding author: Shuying Sun, Department of Mathematics, Texas State University, San Marcos, TX 78666, USA, e-mail:

Acknowledgments

This work is supported by Dr. Shuying Sun’s start-up funds and the Research Enhancement Program provided by Texas State University. We are very grateful for three anonymous reviewers’ comments and suggestions, which help us improve this manuscript greatly.

References

Akalin, A., M. Kormaksson, S. Li, F. E. Garrett-Bakelman, M. E. Figueroa, A. Melnick and C. E. Mason (2012): “methylKit: a comprehensive R package for the analysis of genome-wide DNA methylation profiles,” Genome Biol., 13, R87. Search in Google Scholar

Akman, K., T. Haaf, S. Gravina, J. Vijg and A. Tresch (2014): “Genome-wide quantitative analysis of DNA methylation from bisulfite sequencing data,” Bioinformatics, 30, 1933–1934. Search in Google Scholar

Aryee, M. J., A. E. Jaffe, H. Corrada-Bravo, C. Ladd-Acosta, A. P. Feinberg, K. D. Hansen and R. A. Irizarry (2014): “Minfi: a flexible and comprehensive Bioconductor package for the analysis of Infinium DNA methylation microarrays,” Bioinformatics, 30, 1363–1369. Search in Google Scholar

Baylin, S. and T. H. Bestor (2002): “Altered methylation patterns in cancer cell genomes: Cause or consequence?,” Cancer Cell, 1, 299–305. Search in Google Scholar

Becker, C., J. Hagmann, J. Muller, D. Koenig, O. Stegle, K. Borgwardt and D. Weigel (2011): “Spontaneous epigenetic variation in the Arabidopsis thaliana methylome,” Nature, 480, 245–249. Search in Google Scholar

Benjamini, Y. and R. Heller (2007): “False discovery rates for spatial signals,” J. Am. Stat. Assoc., 102, 1272–1281. Search in Google Scholar

Benjamini, Y. and Y. Hochberg (1997): “Multiple hypotheses testing with weights,” Scand. J. Stat., 24, 407–418. Search in Google Scholar

Benjamini, Y., A. M. Krieger and D. Yekutieli (2006): “Adaptive linear step-up procedures that control the false discovery rate,” Biometrika, 93, 491–507. Search in Google Scholar

Bock, C. (2012): “Analysing and interpreting DNA methylation data,” Anglais, 13, 705–719. Search in Google Scholar

Butcher, L. M. and S. Beck (2015): “Probe Lasso: A novel method to rope in differentially methylated regions with 450K DNA methylation data,” Methods (San Diego, Calif.), 72, 21–28. Search in Google Scholar

Challen, G. A., D. Sun, M. Jeong, M. Luo, J. Jelinek, J. S. Berg, C. Bock, A. Vasanthakumar, H. Gu, Y. Xi, S. Liang, Y. Lu, G. J. Darlington, A. Meissner, J.-P. J. Issa, L. A. Godley, W. Li and M. A. Goodell (2011): “Dnmt3a is essential for hematopoietic stem cell differentiation,” Nat. Genet., 44, 23–31. Search in Google Scholar

Dolzhenko, E. and A. D. Smith (2014): “Using beta-binomial regression for high-precision differential methylation analysis in multifactor whole-genome bisulfite sequencing experiments,” BMC Bioinformatics, 15, 215–215. Search in Google Scholar

Du, P. and R. Bourgon (2014): “methyAnalysis: DNA methylation data analysis and visualization,” R package version 1.10.0. Search in Google Scholar

Eckhardt, F., J. Lewin, R. Cortese, V. K. Rakyan, J. Attwood, M. Burger, J. Burton, T. V. Cox, R. Davies, T. A. Down, C. Haefliger, R. Horton, K. Howe, D. K. Jackson, J. Kunde, C. Koenig, J. Liddle, D. Niblett, T. Otto, R. Pettett, S. Seemann, C. Thompson, T. West, J. Rogers, A. Olek, K. Berlin and S. Beck (2006): “DNA methylation profiling of human chromosomes 6, 20 and 22,” Nat. Genet., 38, 1378–1385. Search in Google Scholar

Feng, H., K. N. Conneely and H. Wu (2014): “A Bayesian hierarchical model to detect differentially methylated loci from single nucleotide resolution sequencing data,” Nucleic Acids Res., 42, e69–e69. Search in Google Scholar

Gopalakrishnan, S., B. O. Van Emburgh and K. D. Robertson (2008): “DNA methylation in development and human disease,” Mutation Research/Fundamental and Molecular Mechanisms of Mutagenesis, 647, 30–38. Search in Google Scholar

Gu, H., C. Bock, T. S. Mikkelsen, N. Jager, Z. D. Smith, E. Tomazou, A. Gnirke, E. S. Lander and A. Meissner (2010): “Genome-scale DNA methylation mapping of clinical samples at single-nucleotide resolution,” Nat. Methods, 7, 133–136. Search in Google Scholar

Gu, H., Z. D. Smith, C. Bock, P. Boyle, A. Gnirke and A. Meissner (2011): “Preparation of reduced representation bisulfite sequencing libraries for genome-scale DNA methylation profiling,” Nat. Protoc., 6, 468–481. Search in Google Scholar

Guzman, L., M. Depix, A. Salinas, R. Roldan, F. Aguayo, A. Silva and R. Vinet (2012): “Analysis of aberrant methylation on promoter sequences of tumor suppressor genes and total DNA in sputum samples: a promising tool for early detection of COPD and lung cancer in smokers,” Diagn. Pathol., 7, 87. Search in Google Scholar

Hansen, K., B. Langmead and R. Irizarry (2012): “BSmooth: from whole genome bisulfite sequencing reads to differentially methylated regions,” Genome Biol., 13, R83. Search in Google Scholar

Hansen, K. D., W. Timp, H. C. Bravo, S. Sabunciyan, B. Langmead, O. G. McDonald, B. Wen, H. Wu, Y. Liu, D. Diep, E. Briem, K. Zhang, R. A. Irizarry and A. P. Feinberg (2011): “Increased methylation variation in epigenetic domains across cancer types,” Nat. Genet., 43, 768–775. Search in Google Scholar

Harris, E. Y., N. Ponts, A. Levchuk, K. L. Roch and S. Lonardi (2010): “BRAT: bisulfite-treated reads analysis tool,” Bioinformatics, 26, 572–573. Search in Google Scholar

Hebestreit, K., M. Dugas and H. U. Klein (2013): “Detection of significantly differentially methylated regions in targeted bisulfite sequencing data,” Bioinformatics, 29, 1647–1653. Search in Google Scholar

Irizarry, R. A., C. Ladd-Acosta, B. Wen, Z. Wu, C. Montano, P. Onyango, H. Cui, K. Gabo, M. Rongione, M. Webster, H. Ji, J. B. Potash, S. Sabunciyan and A. P. Feinberg (2009): “The human colon cancer methylome shows similar hypo- and hypermethylation at conserved tissue-specific CpG island shores,” Nat. Genet., 41, 178–186. Search in Google Scholar

Jaffe, A. E., P. Murakami, H. Lee, J. T. Leek, M. D. Fallin, A. P. Feinberg and R. A. Irizarry (2012): “Bump hunting to identify differentially methylated regions in epigenetic epidemiology studies,” Int. J. Epidemiol., 41, 200–209. Search in Google Scholar

Jayanth, N. and M. Puranik (2011): “Methylation stabilizes the imino tautomer of dAMP and amino tautomer of dCMP in solution,” J. Phys. Chem. B, 115, 6234–6242. Search in Google Scholar

Jiang, P., K. Sun, F. M. F. Lun, A. M. Guo, H. Wang, K. C. A. Chan, R. W. K. Chiu, Y. M. D. Lo and H. Sun (2014): “Methy-pipe: an integrated bioinformatics pipeline for whole genome bisulfite sequencing data analysis,” PLoS ONE, 9, e100360. Search in Google Scholar

Law, J. A. and S. E. Jacobsen (2010): “Establishing, maintaining and modifying DNA methylation patterns in plants and animals,” Anglais, 11, 204–220. Search in Google Scholar

Li, S., F. Garrett-Bakelman, A. Akalin, P. Zumbo, R. Levine, B. To, I. Lewis, A. Brown, R. D’Andrea, A. Melnick and C. Mason (2013): “An optimized algorithm for detecting and annotating regional differential methylation,” BMC Bioinformatics, 14, S10. Search in Google Scholar

Li, Y., J. Zhu, G. Tian, N. Li, Q. Li, M. Ye, H. Zheng, J. Yu, H. Wu, J. Sun, H. Zhang, Q. Chen, R. Luo, M. Chen, Y. He, X. Jin, Q. Zhang, C. Yu, G. Zhou, J. Sun, Y. Huang, H. Zheng, H. Cao, X. Zhou, S. Guo, X. Hu, X. Li, K. Kristiansen, L. Bolund, J. Xu, W. Wang, H. Yang, J. Wang, R. Li, S. Beck, J. Wang and X. Zhang (2010): “The DNA Methylome of Human Peripheral Blood Mononuclear Cells,” PLoS Biology, 8, e1000533. Search in Google Scholar

Lister, R., M. Pelizzola, R. H. Dowen, R. D. Hawkins, G. Hon, J. Tonti-Filippini, J. R. Nery, L. Lee, Z. Ye, Q. M. Ngo, L. Edsall, J. Antosiewicz-Bourget, R. Stewart, V. Ruotti, A. H. Millar, J. A. Thomson, B. Ren and J. R. Ecker (2009): “Human DNA methylomes at base resolution show widespread epigenomic differences,” Nature, 462, 315–322. Search in Google Scholar

Park, Y., M. E. Figueroa, L. S. Rozek and M. A. Sartor (2014): “MethylSig: a whole genome DNA methylation analysis pipeline,” Bioinformatics, 30, 2414–2422. Search in Google Scholar

Pawitan, Y., S. Michiels, S. Koscielny, A. Gusnanto and A. Ploner (2005): “False discovery rate, sensitivity and sample size for microarray studies,” Bioinformatics, 21, 3017–3024. Search in Google Scholar

Peters, T. J., M. J. Buckley, A. L. Statham, R. Pidsley, K. Samaras, R. V Lord, S. J. Clark and P. L. Molloy (2015): “De novo identification of differentially methylated regions in the human genome,” Epigenetics Chromatin, 8, 6. Search in Google Scholar

Robinson, M. D., A. Kahraman, C. W. Law, H. Lindsay, M. Nowicka, L. M. Weber and X. Zhou (2014): “Statistical methods for detecting differentially methylated loci and regions,” Front. Genet., 5, 324. Search in Google Scholar

Saito, Y., J. Tsuji and T. Mituyama (2014): “Bisulfighter: accurate detection of methylated cytosines and differentially methylated regions,” Nucleic Acids Res., 42, e45. Search in Google Scholar

Sofer, T., E. D. Schifano, J. A. Hoppin, L. Hou and A. A. Baccarelli (2013): “A-clustering: a novel method for the detection of co-regulated methylation regions, and regions associated with exposure,” Bioinformatics, 29, 2884–2891. Search in Google Scholar

Song, Q., B. Decato, E. E. Hong, M. Zhou, F. Fang, J. Qu, T. Garvin, M. Kessler, J. Zhou and A. D. Smith (2013): “A reference methylome database and analysis pipeline to facilitate integrative and comparative epigenomics,” PLoS ONE, 8, e81148. Search in Google Scholar

Storey, J. D. (2002): “A direct approach to false discovery rates,” J Roy Stat Soc B Met, 64, 479–498. Search in Google Scholar

Storey, J. D. and R. Tibshirani (2003): “Statistical significance for genomewide studies,” Proc. Natl. Acad. Sci., 100, 9440–9445. Search in Google Scholar

Strathdee, G. and R. Brown (2002): “Aberrant DNA methylation in cancer: potential clinical interventions,” Expert Rev. Mol. Med., 4, 1–17. Search in Google Scholar

Su, J., H. Yan, Y. Wei, H. Liu, H. Liu, F. Wang, J. Lv, Q. Wu and Y. Zhang (2013): “CpG_MPs: identification of CpG methylation patterns of genomic regions from high-throughput bisulfite sequencing data,” Nucleic Acids Res., 41, e4–e4. Search in Google Scholar

Sun, D., Y. Xi, B. Rodriguez, H. Park, P. Tong, M. Meong, M. Goodell and W. Li (2014): “MOABS: model based analysis of bisulfite sequencing data,” Genome Biol., 15, R38. Search in Google Scholar

Sun, S. and X. Yu (2016a): “HMM-Fisher: identifying differential methylation using a hidden Markov model and Fisher’s exact test,” Stat. Appl. Genet. Mol. Biol., 15, 55–67. Search in Google Scholar

Sun, S. and X. Yu (2016b): “HMM-Fisher,” GitHub repository, https://github.com/xxy39/HMM-Fisher. Search in Google Scholar

Sun, Z., Y. W. Asmann, K. R. Kalari, B. Bot, J. E. Eckel-Passow, T. R. Baker, J. M. Carr, I. Khrebtukova, S. Luo, L. Zhang, G. P. Schroth, E. A. Perez and E. A. Thompson (2011): “Integrated analysis of gene expression, CpG Island methylation, and gene copy number in breast cancer cells by deep sequencing,” PLoS ONE, 6, e17490. Search in Google Scholar

Suzuki, M. and A. Bird (2008): “DNA methylation landscapes: provocative insights from epigenomics,” Anglais, 9, 465–476. Search in Google Scholar

Wang, D., L. Yan, Q. Hu, L. E. Sucheston, M. J. Higgins, C. B. Ambrosone, C. S. Johnson, D. J. Smiraglia and S. Liu (2012): “IMA: an R package for high-throughput analysis of Illumina’s 450K Infinium methylation data,” Bioinformatics, 28, 729–730. Search in Google Scholar

Wang, H., L. Tuominen and C. Tsai (2011): “SLIM: a sliding linear model for estimating the proportion of true null hypotheses in datasets with dependence structures,” Bioinformatics, 27, 225–231. Search in Google Scholar

Wei, S., R. Brown and T. Huang (2003): “Aberrant DNA methylation in ovarian cancer: is there an epigenetic predisposition to drug response?,” Ann. N. Y. Acad Sci., 983, 243–250. Search in Google Scholar

Xu, H., R. H. Podolsky, D. Ryu, X. Wang, S. Su, H. Shi and V. George (2013): “A method to detect differentially methylated loci with next-generation sequencing,” Genet Epidemiol., 37, 377–382. Search in Google Scholar

Yu, X. and S. Sun (2016a): “HMM-DM: identifying differentially methylated regions using a hidden Markov model,” Stat. Appl. Genet. Mol. Biol., 15, 69–81. Search in Google Scholar

Yu, X. and S. Sun (2016b): “HMM-DM,” GitHub repository, https://github.com/xxy39/HMM-DM. Search in Google Scholar

Zhang, Y., H. Liu, J. Lv, X. Xiao, J. Zhu, X. Liu, J. Su, X. Li, Q. Wu, F. Wang and Y. Cui (2011): “QDMR: a quantitative method for identification of differentially methylated regions by entropy,” Nucleic Acids Res., 39, e58–e58. Search in Google Scholar

Supplemental Material:

The online version of this article (DOI: 10.1515/sagmb-2015-0078) offers supplementary material, available to authorized users.

Published Online: 2016-2-24
Published in Print: 2016-4-1

©2016 by De Gruyter