Jump to ContentJump to Main Navigation
Show Summary Details
More options …

Statistical Applications in Genetics and Molecular Biology

Editor-in-Chief: Sanguinetti, Guido

6 Issues per year

IMPACT FACTOR 2017: 0.812
5-year IMPACT FACTOR: 1.104

CiteScore 2017: 0.86

SCImago Journal Rank (SJR) 2017: 0.456
Source Normalized Impact per Paper (SNIP) 2017: 0.527

Mathematical Citation Quotient (MCQ) 2017: 0.04

See all formats and pricing
More options …
Volume 2, Issue 1


Volume 10 (2011)

Volume 9 (2010)

Volume 6 (2007)

Volume 5 (2006)

Volume 4 (2005)

Volume 2 (2003)

Volume 1 (2002)

Supervised Detection of Regulatory Motifs in DNA Sequences

Sunduz Keles / Mark J. van der Laan / Sandrine Dudoit / Biao Xing / Michael B. Eisen
  • Department of Molecular and Cell Biology, University of California, Berkeley; Life Sciences Division, Ernest Orlando Lawrence Berkeley National Laboratory, Berkeley
  • Other articles by this author:
  • De Gruyter OnlineGoogle Scholar
Published Online: 2003-08-25 | DOI: https://doi.org/10.2202/1544-6115.1015

Identification of transcription factor binding sites (regulatory motifs) is a major interest in contemporary biology. We propose a new likelihood based method, COMODE, for identifying structural motifs in DNA sequences. Commonly used methods (e.g. MEME, Gibbs motif sampler) model binding sites as families of sequences described by a position weight matrix (PWM) and identify PWMs that maximize the likelihood of observed sequence data under a simple multinomial mixture model. This model assumes that the positions of the PWM correspond to independent multinomial distributions with four cell probabilities. We address supervising the search for DNA binding sites using the information derived from structural characteristics of protein-DNA interactions. We extend the simple multinomial mixture model to a constrained multinomial mixture model by incorporating constraints on the information content profiles or on specific parameters of the motif PWMs. The parameters of this extended model are estimated by maximum likelihood using a nonlinear constraint optimization method. Likelihood-based cross-validation is used to select model parameters such as motif width and constraint type. The performance of COMODE is compared with existing motif detection methods on simulated data that incorporate real motif examples from Saccharomyces cerevisiae. The proposed method is especially effective when the motif of interest appears as a weak signal in the data. Some of the transcription factor binding data of Lee et al. (2002) were also analyzed using COMODE and biologically verified sites were identified.

Keywords: DNA sequence; co-regulated genes; transcription factor; regulatory motif; mixture model; position weight matrix; structured motif; information content; entropy; nonlinear constraint maximization

About the article

Published Online: 2003-08-25

Citation Information: Statistical Applications in Genetics and Molecular Biology, Volume 2, Issue 1, ISSN (Online) 1544-6115, DOI: https://doi.org/10.2202/1544-6115.1015.

Export Citation

©2011 Walter de Gruyter GmbH & Co. KG, Berlin/Boston.Get Permission

Citing Articles

Here you can find all Crossref-listed publications in which this article is cited. If you would like to receive automatic email messages as soon as this article is cited in other publications, simply activate the “Citation Alert” on the top of this page.

Chengpeng Bi
IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2009, Volume 6, Number 3, Page 370

Comments (0)

Please log in or register to comment.
Log in