Jump to ContentJump to Main Navigation
Show Summary Details
In This Section

Statistical Applications in Genetics and Molecular Biology

Editor-in-Chief: Stumpf, Michael P.H.

6 Issues per year


IMPACT FACTOR 2016: 0.646
5-year IMPACT FACTOR: 1.191

CiteScore 2016: 0.94

SCImago Journal Rank (SJR) 2015: 0.954
Source Normalized Impact per Paper (SNIP) 2015: 0.554

Mathematical Citation Quotient (MCQ) 2015: 0.06

Online
ISSN
1544-6115
See all formats and pricing
In This Section
Volume 16, Issue 2 (Apr 2017)

Issues

A time warping approach to multiple sequence alignment

Ana Arribas-Gil
  • Departamento de Estadística, Universidad Carlos III de Madrid, C/ Madrid, 126 - 28903 Getafe, Spain
/ Catherine Matias
  • Corresponding author
  • Sorbonne Universités, Université Pierre et Marie Curie, Université Paris Diderot, Centre National de la Recherche Scientifique, Laboratoire de Probabilités et Modèles Aléatoires, 4 place Jussieu, 75252 PARIS Cedex 05, France
  • Email:
Published Online: 2017-04-12 | DOI: https://doi.org/10.1515/sagmb-2016-0043

Abstract

We propose an approach for multiple sequence alignment (MSA) derived from the dynamic time warping viewpoint and recent techniques of curve synchronization developed in the context of functional data analysis. Starting from pairwise alignments of all the sequences (viewed as paths in a certain space), we construct a median path that represents the MSA we are looking for. We establish a proof of concept that our method could be an interesting ingredient to include into refined MSA techniques. We present a simple synthetic experiment as well as the study of a benchmark dataset, together with comparisons with 2 widely used MSA softwares.

Keywords: Alignment; dynamic time warping; multiple sequence alignment; warping

References

  • Arribas-Gil, A. (2010): “Parameter estimation in multiple hidden i.i.d. models from biological multiple alignment,” Stat. Appl. Genet. Mol. Biol., 9, 10. [Web of Science]

  • Arribas-Gil, A. and H.-G. Müller (2014): “Pairwise dynamic time warping for event data,” Comput. Stat. Data Anal., 69, 255–268. [Crossref] [Web of Science]

  • Arribas-Gil, A., E. Gassiat, and C. Matias (2006): “Parameter estimation in pair-hidden Markov models,” Scand. J. Stat., 33, 651–671. [Crossref]

  • Arribas-Gil, A., D. Metzler, and J.-L. Plouhinec (2009): “Statistical alignment with a sequence evolution model allowing rate heterogeneity along the sequence,” IEEE/ACM Trans. Comput. Biol. Bioinf., 6, 281–295. [Crossref]

  • Do, C., M. Mahabhashyam, M. Brudno, and S. Batzoglou (2005): “ProbCons: probabilistic consistency-based multiple sequence alignment,” Genome Res., 15, 330–340. [Crossref]

  • Durbin, R., S. Eddy, A. Krogh, and G. Mitchison (1998): Biological sequence analysis: probabilistic models of proteins and nucleic acids, Cambridge University Press, Cambridge.

  • Edgar, R. C. (2004): “MUSCLE: multiple sequence alignment with high accuracy and high throughput,” Nucleic Acids Res., 32, 1792. [Crossref]

  • Edgar, R. C. and S. Batzoglou (2006): “Multiple sequence alignment,” Curr. Opin. Struct. Biol., 16, 368–373. [Web of Science] [Crossref]

  • Floyd, R. W. and R. L. Rivest (1975): “Algorithm 489: the algorithm SELECT-for finding the ith smallest of n elements [M1],” Commun. ACM, 18, 173. [Crossref]

  • Gotoh, O. (1996): “Significant improvement in accuracy of multiple protein sequence alignments by iterative refinement as assessed by reference to structural alignments,” J. Mol. Biol., 264, 823–838. [Crossref]

  • Katoh, K. and D. M. Standley (2013): “MAFFT multiple sequence alignment software version 7: improvements in performance and usability,” Mol. Biol. Evol., 30, 772. [Crossref] [Web of Science]

  • Keogh, E. and C. A. Ratanamahatana (2005): “Exact indexing of dynamic time warping,” Knowl. Inf. Syst., 7, 358–386. [Crossref]

  • Kruskal, J. B. (1983): “An overview of sequence comparison: time warps, string edits, and macromolecules,” SIAM Rev., 25, 201–237. [Crossref]

  • Kumar, S. and A. Filipski (2007): “Multiple sequence alignment: in pursuit of homologous DNA positions,” Genome Res., 17, 127–135. [Web of Science] [Crossref]

  • Lipman, D. J., S. F. Altschul, and J. D. Kececioglu (1989): “A tool for multiple sequence alignment,” Proc. Natl. Acad. Sci. USA PNAS, 86, 4412–4415. [Crossref]

  • Liu, X. and H.-G. Müller (2004): “Functional convex averaging and synchronization for time-warped random curves,” J. Am. Stat. Assoc., 99, 687–699. [Crossref]

  • Needleman, S. and C. Wunsch (1970): “A general method applicable to the search for similarities in the amino acid sequence of two proteins,” J. Mol. Biol., 48, 443–453. [Crossref]

  • Notredame, C. (2007): “Recent evolutions of multiple sequence alignment algorithms,” PLOS Comput. Biol., 3, 1–4. [Crossref] [Web of Science]

  • Notredame, C., D. G. Higgins, and J. Heringa (2000): “T-coffee: a novel method for fast and accurate multiple sequence alignment,” J. Mol. Biol., 302, 205–217. [Crossref]

  • Pages, H., P. Aboyoun, R. Gentleman, and S. DebRoy (2016): “Biostrings: string objects representing biological sequences, and matching algorithms,” R package version 2.28.0.

  • Pais, F. S.-M., P. d. C. Ruy, G. Oliveira, and R. S. Coimbra (2014): “Assessing the efficiency of multiple sequence alignment programs,” Algorithms Mol. Biol., 9, 4. [Web of Science] [Crossref]

  • Pei, J. and N. V. Grishin (2006): “MUMMALS: multiple sequence alignment improved by using hidden Markov models with local structural information,” Nucleic Acids Res., 34, 4364–4374. [Crossref]

  • Smith, T. F., M. S. Waterman, and W. M. Fitch (1982): “Comparative biosequence metrics,” J. Mol. Evol., 18, 423–423. [Crossref]

  • Tang, R. and H.-G. Müller (2008): “Pairwise curve synchronization for functional data,” Biometrika, 95, 875. [Web of Science] [Crossref]

  • Thompson, J., D. G. Higgins, and T. J. Gibson (1994): “CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice,” Nucleic Acids Res., 22, 4673–4680. [Crossref]

  • Thompson, J. D., F. Plewniak, and O. Poch (1999): “A comprehensive comparison of multiple sequence alignment programs,” Nucleic Acids Res., 27, 2682. [Crossref]

  • Thompson, J. D., P. Koehl, R. Ripp, and O. Poch (2005): “Balibase 3.0: latest developments of the multiple sequence alignment benchmark,” Proteins Struct. Funct. Bioinf., 61, 127–136.

  • Thorne, J., H. Kishino, and J. Felsenstein (1991): “An evolutionary model for maximum likelihood alignment of DNA sequences.” J. Mol. Evol., 33, 114–124. [Crossref]

  • Wallace, I. M., G. Blackshields, and D. G. Higgins (2005): “Multiple sequence alignments,” Curr. Opin. Struct. Biol., 15, 261–266. [Crossref]

  • Yang, Z. (2006): Computational molecular evolution, Oxford series in ecology and evolution, Oxford University Press, Oxford.

About the article

Published Online: 2017-04-12

Published in Print: 2017-04-25



Citation Information: Statistical Applications in Genetics and Molecular Biology, ISSN (Online) 1544-6115, ISSN (Print) 2194-6302, DOI: https://doi.org/10.1515/sagmb-2016-0043. Export Citation

Comments (0)

Please log in or register to comment.
Log in