Jump to ContentJump to Main Navigation
Show Summary Details

Statistical Applications in Genetics and Molecular Biology

Editor-in-Chief: Stumpf, Michael P.H.


IMPACT FACTOR increased in 2015: 1.265
5-year IMPACT FACTOR: 1.423
Rank 42 out of 123 in category Statistics & Probability in the 2015 Thomson Reuters Journal Citation Report/Science Edition

SCImago Journal Rank (SJR) 2015: 0.954
Source Normalized Impact per Paper (SNIP) 2015: 0.554
Impact per Publication (IPP) 2015: 1.061

Mathematical Citation Quotient (MCQ) 2015: 0.06

99,00 € / $149.00 / £75.00*

Online
ISSN
1544-6115
See all formats and pricing

 


Select Volume and Issue
Loading journal volume and issue information...

30,00 € / $42.00 / £23.00

Get Access to Full Text

Alignment-free Sequence Comparison for Biologically Realistic Sequences of Moderate Length

Conrad J. Burden1 / Junmei Jing2 / Susan R. Wilson3

1Australian National University

2Australian National University

3Australian National University

Citation Information: Statistical Applications in Genetics and Molecular Biology. Volume 11, Issue 1, Pages 1–28, ISSN (Online) 1544-6115, DOI: 10.2202/1544-6115.1724, December 2011

Publication History

Published Online:
2011-12-09

The D2 statistic, defined as the number of matches of words of some pre-specified length k, is a computationally fast alignment-free measure of biological sequence similarity. However there is some debate about its suitability for this purpose as the variability in D2 may be dominated by the terms that reflect the noise in each of the single sequences only. We examine the extent of the problem and the effectiveness of overcoming it by using two mean-centred variants of this statistic, D2* and D2c. We conclude that all three statistics are potentially useful measures of sequence similarity, for which reasonably accurate p-values can be estimated under a null hypothesis of sequences composed of identically and independently distributed letters. We show that D2 and D2c, and to a somewhat lesser extent D2*, perform well in tests to classify moderate length query sequences as putative cis-regulatory modules.

Comments (0)

Please log in or register to comment.