Background: A precise experimental identification of transcription factor binding motifs (TFBMs), accurate to a single base pair, is time-consuming and difficult. For several databases, TFBM annotations are extracted from the literature and stored 5ʹ → 3ʹ relative to the target gene. Mixing the two possible orientations of a motif results in poor information content of subsequently computed position frequency matrices (PFMs) and sequence logos. Since these PFMs are used to predict further TFBMs, we address the question if the TFBMs underlying a PFM can be re-annotated automatically to improve both the information content of the PFM and subsequent classification performance.
Results: We present MoRAine, an algorithm that re-annotates transcription factor binding motifs. Each motif with experimental evidence underlying a PFM is compared against each other such motif. The goal is to re-annotate TFBMs by possibly switching their strands and shifting them a few positions in order to maximize the information content of the resulting adjusted PFM. We present two heuristic strategies to perform this optimization and subsequently show that MoRAine significantly improves the corresponding sequence logos. Furthermore, we justify the method by evaluating specificity, sensitivity, true positive, and false positive rates of PFM-based TFBM predictions for E. coli using the original database motifs and the MoRAine-adjusted motifs. The classification performance is considerably increased if MoRAine is used as a preprocessing step.
Conclusions: MoRAine is integrated into a publicly available web server and can be used online or downloaded as a stand-alone version from http://moraine.cebitec.uni-bielefeld.de.