Jump to ContentJump to Main Navigation
Show Summary Details

The International Journal of Biostatistics

Ed. by Chambaz, Antoine / Hubbard, Alan E. / van der Laan, Mark J.

2 Issues per year


IMPACT FACTOR 2015: 0.667
5-year IMPACT FACTOR: 1.188

SCImago Journal Rank (SJR) 2015: 0.495
Source Normalized Impact per Paper (SNIP) 2015: 0.180
Impact per Publication (IPP) 2015: 0.319

Mathematical Citation Quotient (MCQ) 2015: 0.04

Online
ISSN
1557-4679
See all formats and pricing

A Nearly Exhaustive Search for CpG Islands on Whole Chromosomes

Fushing Hsieh
  • University of California, Davis
/ Shu-Chun Chen
  • Academia Sinica
/ Katherine Pollard
  • University of California, San Francisco
Published Online: 2009-05-07 | DOI: https://doi.org/10.2202/1557-4679.1158

CpG islands are genome subsequences with an unexpectedly high number of CG di-nucleotides. They are typically identified using filtering criteria (e.g., G+C% expected vs. observed CpG ratio and length) and are computed using sliding window methods. Most such studies illusively assume an exhaustive search of CpG islands are achieved on the genome sequence of interest. We devise a Lexis diagram and explicitly show that filtering criteria-based definitions of CpG islands are mathematically incomplete and non-operational. These facts imply that the sliding window methods frequently fail to identify a large percentage of subsequences that meet the filtering criteria. We also demonstrate that an exhaustive search is computationally expensive. We develop the Hierarchical Factor Segmentation (HFS) algorithm, a pattern recognition technique with an adaptive model selection device to overcome the incompleteness and non-operational drawbacks, and to achieve effective computations for identifying CpG-islands. The concept of a CpG island “core" is introduced and computed using the HFS algorithm, which is independent from any specific filtering criteria. Upon such a CpG island “core," a CpG-island is constructed using a Lexis diagram. This two-step computational approach provides a nearly exhaustive search for CpG islands that can be practically implemented on whole chromosomes. In a simulation study realistically mimicking CpG-island dynamics through a Hidden Markov Model we demonstrate that this approach retains very high sensitivity and specificity, that is, very low rates of false positives and false negatives. Finally, we apply the HFS algorithm to identify CpG island cores on human chromosome 21.

Keywords: AIC and BIC model selection criteria; non-parametric decoding; filtering criteria; hierarchical factor segmentation; human chromosome 21; mathematical incompleteness; methylation

About the article

Published Online: 2009-05-07


Citation Information: The International Journal of Biostatistics, ISSN (Online) 1557-4679, DOI: https://doi.org/10.2202/1557-4679.1158. Export Citation

Citing Articles

Here you can find all Crossref-listed publications in which this article is cited. If you would like to receive automatic email messages as soon as this article is cited in other publications, simply activate the “Citation Alert” on the top of this page.

[1]
Scott M. Langevin, Melissa Eliot, Rondi A. Butler, Agnes Cheong, Xiang Zhang, Michael D. McClean, Devin C. Koestler, and Karl T. Kelsey
Clinical Epigenetics, 2015, Volume 7, Number 1
[2]
Marina Bibikova, Bret Barnes, Chan Tsan, Vincent Ho, Brandy Klotzle, Jennie M. Le, David Delano, Lu Zhang, Gary P. Schroth, Kevin L. Gunderson, Jian-Bing Fan, and Richard Shen
Genomics, 2011, Volume 98, Number 4, Page 288

Comments (0)

Please log in or register to comment.
Log in