Jump to ContentJump to Main Navigation
Show Summary Details
More options …

Statistical Applications in Genetics and Molecular Biology

Editor-in-Chief: Sanguinetti, Guido


IMPACT FACTOR 2018: 0.536
5-year IMPACT FACTOR: 0.764

CiteScore 2018: 0.49

SCImago Journal Rank (SJR) 2018: 0.316
Source Normalized Impact per Paper (SNIP) 2018: 0.342

Mathematical Citation Quotient (MCQ) 2017: 0.04

Online
ISSN
1544-6115
See all formats and pricing
More options …
Volume 2, Issue 1

Issues

Volume 10 (2011)

Volume 9 (2010)

Volume 6 (2007)

Volume 5 (2006)

Volume 4 (2005)

Volume 2 (2003)

Volume 1 (2002)

On the Power of Profiles for Transcription Factor Binding Site Detection

Sven Rahmann
  • Computational Molecular Biology, Max Planck Institute for Molecular Genetics, and Department of Mathematics and Computer Science, Freie Universität Berlin.
  • Other articles by this author:
  • De Gruyter OnlineGoogle Scholar
/ Tobias Müller / Martin Vingron
Published Online: 2003-11-29 | DOI: https://doi.org/10.2202/1544-6115.1032

Transcription factor binding site (TFBS) detection plays an important role in computational biology, with applications in gene finding and gene regulation. The sites are often modeled by gapless profiles, also known as position-weight matrices. Past research has focused on the significance of profile scores (the ability to avoid false positives), but this alone is not enough: The profile must also possess the power to detect the true positive signals. Several completed genomes are now available, and the search for TFBSs is moving to a large scale; so discriminating signal from noise becomes even more challenging.Since TFBS profiles are usually estimated from only a few experimentally confirmed instances, careful regularization is an important issue. We present a novel method that is well suited for this situation.We further develop measures that help in judging profile quality, based on both sensitivity and selectivity of a profile. It is shown that these quality measures can be efficiently computed, and we propose statistically well-founded methods to choose score thresholds.Our findings are applied to the TRANSFAC database of transcription factor binding sites. The results are disturbing: If we insist on a significance level of 5% in sequences of length 500, only 19% of the profiles detect a true signal instance with 95% success probability under varying background sequence compositions.

Keywords: Transcription factor binding site (TFBS); Profile; Position specific score matrix (PSSM); Position-weight matrix (PWM); Log-odds score; Exact Test; Significance; Power; TRANSFAC

About the article

Published Online: 2003-11-29


Citation Information: Statistical Applications in Genetics and Molecular Biology, Volume 2, Issue 1, ISSN (Online) 1544-6115, DOI: https://doi.org/10.2202/1544-6115.1032.

Export Citation

©2011 Walter de Gruyter GmbH & Co. KG, Berlin/Boston.Get Permission

Citing Articles

Here you can find all Crossref-listed publications in which this article is cited. If you would like to receive automatic email messages as soon as this article is cited in other publications, simply activate the “Citation Alert” on the top of this page.

[1]
David Zamorano-Sánchez, Jiunn C. N. Fong, Sefa Kilic, Ivan Erill, Fitnat H. Yildiz, and G. A. O'Toole
Journal of Bacteriology, 2015, Volume 197, Number 7, Page 1221
[2]
Tomasz Żemojtel, Szymon M. kiełbasa, Peter F. Arndt, Sarah Behrens, Guillaume Bourque, and Martin Vingron
Genome Biology and Evolution, 2011, Volume 3, Page 1304
[3]
Wolfgang Kopp and Martin Vingron
Bioinformatics, 2017, Volume 33, Number 24, Page 3929
[5]
Szymon Nowakowski and Jerzy Tiuryn
Journal of Biomedical Informatics, 2007, Volume 40, Number 2, Page 139
[6]
Tanya Vavouri and Greg Elgar
Current Opinion in Genetics & Development, 2005, Volume 15, Number 4, Page 395
[7]
Qing Zhou
Journal of Computational Biology, 2010, Volume 17, Number 12, Page 1621
[8]
Thomas Manke, Matthias Heinig, and Martin Vingron
Human Mutation, 2010, Volume 31, Number 4, Page 477
[9]
K. Bozek, A.L. Rosahl, S. Gaub, S. Lorenzen, and H. Herzel
Biosystems, 2010, Volume 102, Number 1, Page 61
[10]
Jan Baumbach, Tobias Wittkop, Christiane Katja Kleindt, and Andreas Tauch
Nature Protocols, 2009, Volume 4, Number 6, Page 992
[11]
M Suarez-Gestal, I Ferreiros-Vidal, J A Ortiz, J J Gomez-Reino, and A Gonzalez
Genes and Immunity, 2008, Volume 9, Number 4, Page 309
[12]
Utz J. Pape, Sven Rahmann, and Martin Vingron
Bioinformatics, 2008, Volume 24, Number 3, Page 350
[13]
Helge G. Roider, Thomas Manke, Sean O'Keeffe, Martin Vingron, and Stefan A. Haas
Bioinformatics, 2009, Volume 25, Number 4, Page 435
[14]
Helge G. Roider, Boris Lenhard, Aditi Kanhere, Stefan A. Haas, and Martin Vingron
Nucleic Acids Research, 2009, Volume 37, Number 19, Page 6305
[15]
Young Min Oh, Jong Kyoung Kim, Seungjin Choi, and Joo-Yeon Yoo
Nucleic Acids Research, 2012, Volume 40, Number 5, Page e38
[16]
Alejandra Medina-Rivera, Cei Abreu-Goodger, Morgane Thomas-Chollier, Heladia Salgado, Julio Collado-Vides, and Jacques van Helden
Nucleic Acids Research, 2011, Volume 39, Number 3, Page 808
[17]
Deborah I. Ritter, Qiang Li, Dennis Kostka, Katherine S. Pollard, Su Guo, and Jeffrey H. Chuang
Molecular Biology and Evolution, 2010, Volume 27, Number 10, Page 2322
[18]
[19]
Aarti Jagannath, Rachel Butler, Sofia I.H. Godinho, Yvonne Couch, Laurence A. Brown, Sridhar R. Vasudevan, Kevin C. Flanagan, Daniel Anthony, Grant C. Churchill, Matthew J.A. Wood, Guido Steiner, Martin Ebeling, Markus Hossbach, Joseph G. Wettstein, Giles E. Duffield, Silvia Gatti, Mark W. Hankins, Russell G. Foster, and Stuart N. Peirson
Cell, 2013, Volume 154, Number 5, Page 1100
[20]
Szymon M. Kiełbasa, Martin Vingron, and Mark Isalan
PLoS ONE, 2008, Volume 3, Number 9, Page e3210
[21]
Yingze Zhang, Daniel Handley, Tommy Kaplan, Haiying Yu, Abha S. Bais, Thomas Richards, Kusum V. Pandit, Qilu Zeng, Panayiotis V. Benos, Nir Friedman, Oliver Eickelberg, Naftali Kaminski, and Magnus Rattray
PLoS ONE, 2011, Volume 6, Number 5, Page e20319
[22]
Pål O. Westermark and David Gatfield
PLOS Genetics, 2016, Volume 12, Number 8, Page e1006231
[23]
Jun Yan, Haifang Wang, Yuting Liu, Chunxuan Shao, and Jeffrey M. Gimble
PLoS Computational Biology, 2008, Volume 4, Number 10, Page e1000193
[24]
Young Min Oh, Jong Kyoung Kim, Yongwook Choi, Seungjin Choi, Joo-Yeon Yoo, and Sridhar Hannenhalli
PLoS ONE, 2009, Volume 4, Number 9, Page e6911
[25]
Karsten Jürchott, Ralf-Jürgen Kuban, Till Krech, Nils Blüthgen, Ulrike Stein, Wolfgang Walther, Christian Friese, Szymon M. Kiełbasa, Ute Ungethüm, Per Lund, Thomas Knösel, Wolfgang Kemmner, Markus Morkel, Johannes Fritzmann, Peter M. Schlag, Walter Birchmeier, Tammo Krueger, Silke Sperling, Christine Sers, Hans-Dieter Royer, Hanspeter Herzel, Reinhold Schäfer, and Vivian G. Cheung
PLoS Genetics, 2010, Volume 6, Number 12, Page e1001231
[26]
Michal Dabrowski, Norbert Dojer, Izabella Krystkowiak, Bozena Kaminska, and Bartek Wilczynski
BMC Bioinformatics, 2015, Volume 16, Number 1
[27]
Ralf Eggeling, Teemu Roos, Petri Myllymäki, and Ivo Grosse
BMC Bioinformatics, 2015, Volume 16, Number 1
[28]
Thomas Manke, Helge G. Roider, Martin Vingron, and Michael Levitt
PLoS Computational Biology, 2008, Volume 4, Number 3, Page e1000039
[29]
Philip Stegmaier, Nico Voss, Tatiana Meier, Alexander Kel, Edgar Wingender, Juergen Borlak, and Michael Polymenis
PLoS ONE, 2011, Volume 6, Number 3, Page e17738
[30]
Elizabeth T. Hobbs, Talmo Pereira, Patrick K. O’Neill, and Ivan Erill
Algorithms for Molecular Biology, 2016, Volume 11, Number 1
[31]
Cinzia Pizzi and Esko Ukkonen
Theoretical Computer Science, 2008, Volume 395, Number 2-3, Page 137
[32]
D. Kostka, M. W. Hahn, and K. S. Pollard
Genome Biology and Evolution, 2010, Volume 2, Number 0, Page 518

Comments (0)

Please log in or register to comment.
Log in