Multi-color optical mapping is a new technique being developed to obtain detailed physical maps (indicating relative positions of various recognition sites) of DNA molecules. We consider a study design in which the data consist of noisy observations of multiple copies of a DNA molecule marked with colors at recognition sites. The primary goal is to estimate a physical map. A secondary goal is to estimate error rates associated with the experiment, which are potentially useful for analysis and refinement of the biochemical steps in the mapping procedure. We propose statistical models for various sources of error and use maximum likelihood estimation (MLE) to construct a physical map and estimate error rates. To overcome difficulties arising in the maximization process, a latent-variable Markov chain version of the model is proposed, and the EM algorithm is used for maximization. In addition, a simulated annealing procedure is applied to maximize the profile likelihood over the discrete space of sequences of colors. We apply the methods to simulated data on the bacteriophage lambda genome.
©2011 Walter de Gruyter GmbH & Co. KG, Berlin/Boston