Jump to ContentJump to Main Navigation
Show Summary Details
More options …

Statistical Applications in Genetics and Molecular Biology

Editor-in-Chief: Sanguinetti, Guido


IMPACT FACTOR 2018: 0.536
5-year IMPACT FACTOR: 0.764

CiteScore 2018: 0.49

SCImago Journal Rank (SJR) 2018: 0.316
Source Normalized Impact per Paper (SNIP) 2018: 0.342

Mathematical Citation Quotient (MCQ) 2018: 0.02

Online
ISSN
1544-6115
See all formats and pricing
More options …
Volume 4, Issue 1

Issues

Volume 10 (2011)

Volume 9 (2010)

Volume 6 (2007)

Volume 5 (2006)

Volume 4 (2005)

Volume 2 (2003)

Volume 1 (2002)

The Relative Inefficiency of Sequence Weights Approaches in Determining a Nucleotide Position Weight Matrix

Lee A Newberg / Lee Ann McCue / Charles E Lawrence
Published Online: 2005-06-01 | DOI: https://doi.org/10.2202/1544-6115.1135

Approaches based upon sequence weights, to construct a position weight matrix of nucleotides from aligned inputs, are popular but little effort has been expended to measure their quality.We derive optimal sequence weights that minimize the sum of the variances of the estimators of base frequency parameters for sequences related by a phylogenetic tree. Using these we find that approaches based upon sequence weights can perform very poorly in comparison to approaches based upon a theoretically optimal maximum-likelihood method in the inference of the parameters of a position-weight matrix. Specifically, we find that among a collection of primate sequences, even an optimal sequences-weights approach is only 51% as efficient as the maximum-likelihood approach in inferences of base frequency parameters.We also show how to employ the variance estimators to obtain a greedy ordering of species for sequencing. Application of this ordering for the weighted estimators to a primate collection yields a curve with a long plateau that is not observed with maximum-likelihood estimators. This plateau indicates that the use of weighted estimators on these data seriously limits the utility of obtaining the sequences of more than two or three additional species.

Keywords: Sequence Weights; Maximum Likelihood; Motifs; Phylogeny; Sequencing; Consensus Distribution; Position-Weight Matrices

About the article

Published Online: 2005-06-01


Citation Information: Statistical Applications in Genetics and Molecular Biology, Volume 4, Issue 1, ISSN (Online) 1544-6115, ISSN (Print) 2194-6302, DOI: https://doi.org/10.2202/1544-6115.1135.

Export Citation

©2011 Walter de Gruyter GmbH & Co. KG, Berlin/Boston.Get Permission

Citing Articles

Here you can find all Crossref-listed publications in which this article is cited. If you would like to receive automatic email messages as soon as this article is cited in other publications, simply activate the “Citation Alert” on the top of this page.

[1]
Adam J. Hockenberry and Claus O. Wilke
Entropy, 2019, Volume 21, Number 10, Page 1000
[2]
Eric A Stone and Arend Sidow
BMC Bioinformatics, 2007, Volume 8, Number 1
[3]
Zhan Mao Cao, Wen Jun Xiao, and Li Min Peng
Key Engineering Materials, 2010, Volume 439-440, Page 35
[4]
Lee A. Newberg, William A. Thompson, Sean Conlan, Thomas M. Smith, Lee Ann McCue, and Charles E. Lawrence
Bioinformatics, 2007, Volume 23, Number 14, Page 1718

Comments (0)

Please log in or register to comment.
Log in