Confidently Estimating the Number of DNA Replication Origins

Anand Bhaskar 1  and Uri Keich 2
  • 1 Computer Science Division, University of California, Berkeley
  • 2 School of Mathematics and Statistics, University of Sydney

We present a method for estimating and providing a confidence interval for the number of DNA replication origins in the genome of the yeast Kluyveromyces lactis. The method requires an initial set of verified sites from which a position specific frequency matrix (PSFM) can be constructed. We further assume that we have access to a sparingly used experimental procedure which can verify the functionality of a few, but not all, computationally predicted sites. While our motivation comes from estimating the number of autonomously replicating sequences (ARSs), our method can also be applied to estimating the genome-wide number of “functional” transcription factor binding sites, where functionality is determined by experimental verification of the transcription factor binding event using, for example, ChIP data. The reliability of our method is demonstrated by correctly predicting the known number of Saccharomyces cerevisiae ARSs as well as the number of S. cerevisiae probes that bind to the transcription factor ABF1.

