Jump to ContentJump to Main Navigation
Show Summary Details
More options …

Statistical Applications in Genetics and Molecular Biology

Editor-in-Chief: Sanguinetti, Guido

IMPACT FACTOR 2018: 0.536
5-year IMPACT FACTOR: 0.764

CiteScore 2018: 0.49

SCImago Journal Rank (SJR) 2018: 0.316
Source Normalized Impact per Paper (SNIP) 2018: 0.342

Mathematical Citation Quotient (MCQ) 2018: 0.02

See all formats and pricing
More options …
Volume 13, Issue 4


Volume 10 (2011)

Volume 9 (2010)

Volume 6 (2007)

Volume 5 (2006)

Volume 4 (2005)

Volume 2 (2003)

Volume 1 (2002)

A sequential naïve Bayes classifier for DNA barcodes

Michael P. Anderson
  • Corresponding author
  • Department of Biostatistics and Epidemiology, University of Oklahoma Health Sciences Center Oklahoma City, OK, USA
  • Email
  • Other articles by this author:
  • De Gruyter OnlineGoogle Scholar
/ Suzanne R. Dubnicka
Published Online: 2014-07-03 | DOI: https://doi.org/10.1515/sagmb-2013-0025


DNA barcodes are short strands of 255–700 nucleotide bases taken from the cytochrome c oxidase subunit 1 (COI) region of the mitochondrial DNA. It has been proposed that these barcodes may be used as a method of differentiating between biological species. Current methods of species classification utilize distance measures that are heavily dependent on both evolutionary model assumptions as well as a clearly defined “gap” between intra- and interspecies variation. Such distance measures fail to measure classification uncertainty or to indicate how much of the barcode is necessary for classification. We propose a sequential naïve Bayes classifier for species classification to address these limitations. The proposed method is shown to provide accurate species-level classification on real and simulated data. The method proposed here quantifies the uncertainty of each classification and addresses how much of the barcode is necessary.

This article offers supplementary material which is provided at the end of the article.

Keywords: Naïve Bayes classifier; DNA barcoding; phylogenetic analysis; sequential analysis; species classification; species discovery


  • Altschul, S., W. Gish, W. Miller, E. Myers and D. Lipman (1990): “Basic local alignment search tool,” J. Mol. Biol., 215, 403–410.Google Scholar

  • Avise, J. (2000): Phylogeography. The history and formation of species, Cambridge, MA: Harvard University Press.Google Scholar

  • Clare, E., B. Lim, M. Engstrom, J. Eger and P. Hebert (2006): “DNA barcoding of Neotropical bats: species identification and discovery within Guyana,” http://www.barcodeoflife.org/barcode/batsbirds/literature/MEN1657final.pdf.

  • Denver, D., K. Morris, M. Lynch, L. Vassilieva and W. Thomas (2000): “High direct estimate of the mutation rate in the mitochondrial genome of caenorhabditis elegans,” Science, 289, 2342–2344.Google Scholar

  • DeSalle, R. (2006): “Species discovery versus species identification in DNA barcoding efforts: response to rubinoff,” Conserv. Biol., 20, 1545–1547.Google Scholar

  • DIMACS (2007): “Center for discrete mathematics and theoretical computer science,” http://dimacs.rutgers.edu/Workshops/BarcodeResearchChallenges2007.

  • Dove, C. (2000): “A descriptive and phylogenetic analysis of plumulaceous feather chatacters in Charadriiformes,” Ornithological Monographs, 51, 1–163.Google Scholar

  • Ferguson, J. (2002): “On the use of genetic divergence for identifying species,” Biol. J. Linn. Soc., 75, C509–C516.Google Scholar

  • Frézal, L. and R. Leblois (2008): “Four years of DNA barcoding: Current advances and prospects,” Infection, Genetics and Evolution, 8, 727–736.Google Scholar

  • Gascuel, O. and M. Steel (2006): “Neighbor-joining revealed,” Mole. Biol. Evol., 23, 1997–2000.CrossrefGoogle Scholar

  • Hajibabaei, M., J. DeWaard, N. Ivanova, S. Ratnasingham, R. Dooh, S. Kirk, P. Mackie and P. Hebert (2005a): “Critical factors for assembling a high volume of DNA barcodes,” Philos. Transact. R. Soc. (B), 360, 1959–1967.Google Scholar

  • Hajibabaei, M., D. Janzen, J. Burns, W. Hallwachs and P. Hebert (2005b): “DNA barcodes distinguish species of tropical Lepidoptera,” http://www.pnas.org/content/103/4/968.full.

  • Hammond, P. (1992): Global biodiversity: status of the Earth’s living resources. London: Chapman & Hall.Google Scholar

  • Hebert, P., A. Cywinska, S. Ball and J. deWaard (2003a): “Biological identifications through DNA barcodes,” Proc. R. Soc. (B), 270, 313–322.Google Scholar

  • Hebert, P., S. Ratnasingham and J. deWaard (2003b): “Barcoding animal life: Cytochomec oxidase subunit 1 divedrgences among closely related species,” Proc. Biol. Sciences, 270, S96–S99.Google Scholar

  • Johns, G. and J. Avice (1998): “A comparative summary of genetic distances in the vertbrates from the mitochondrial cytochrome b gene,” Mole. Biol. Evol., 15, 1481–1490.CrossrefGoogle Scholar

  • Karlin, S. and S. Altschul (1990): “Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes,” Proc. Natl. Acad. Sci., 2264–2268.Google Scholar

  • Kelly, R., I. Sarkar, D. Eernisse and R. DeSalle (2006): “DNA barcoding using chitons (genus Mopalia),” Mole. Ecol. Notes, 7, 177–183.Web of ScienceGoogle Scholar

  • Kerr, K., M. Stoeckle, C. Dove, L. Weigt, C. Frances and P. Hebert (2007): “Comprehensive DNA barcode coverage of North American birds,” http://www.barcodeoflife.org/barcode/batsbirds/literature/MEN1670final.pdf.

  • Kimura, M. (1980): “A simple method for estimating evolutionary rate of base substitutions through comparative studies of nucleotide sequences,” J. Mol. Evol., 16, 111–120.CrossrefGoogle Scholar

  • Koski, L. and G. Goulding (2001): “The closest BLAST hit is often not the nearest neighbor,” J. Mol. Evol., 52, 540–542.CrossrefGoogle Scholar

  • Marra, P., C. Dove, R. Dolbeer, N. Dahlan, M. Heacker, J. Whatton, N. Diggs, C. France and G. Henkes (2009): “Migratory canada geese cause crash of US Airways flight 1549,” Front. Ecol. Environ., 7, 297–301.CrossrefWeb of ScienceGoogle Scholar

  • McCallum, A. and K. Nigam (1998): “A comparison of event models for naïve Bayes text classification,” Technical Report WS-98-05, AAAI-98 Workshop on Learning for Text Categorization, URL http://www.cs.cmu.edu/mccallum.

  • Meyer, C. and G. Paulay (2005): “DNA barcoding: Error rates based on comprehensive sampling,” Plos Biol., 3, 2229–2238.Google Scholar

  • Ratnasingham, R. and P. Hebert (2007): “BOLD: The barcode of life data system,” Mole. Ecol. Notes, 7, 355–364.CrossrefGoogle Scholar

  • Saitou, N. and M. Nei (1987): “The neighbor-joining method: A new method for reconstruction phylogenetic trees,” Mole. Biol. Evol., 4, 406–425.Google Scholar

  • Stoeckle, M. (2003): “Taxonomy, DNA, and the barcode of life,” Bioscience, 53, 2–3.CrossrefGoogle Scholar

  • Studier, J. and K. Keppler (1988): “A note on the neighbor-joining algorithm of Saitou and Nei,” Mole. Biol. Evol., 5, 729–731.Google Scholar

  • Ward, R., T. Zemlak, B. Innes, P. Last and P. Hebert (2005): “DNA barcoding Australia’s fish species,” http://www.fishbol.org/PDF/wardetal 2005 philtrans.pdf.

  • Zhang, H. (2004): “The optimality of naive Bayes,” in Proceedings of the Seventeenth Florida Artificial Intelligence Research Society Conference, Miami Beach, FL: The AAAI Press, 562–567.Google Scholar

About the article

Corresponding author: Michael P. Anderson, Department of Biostatistics and Epidemiology, University of Oklahoma Health Sciences Center Oklahoma City, OK, USA, e-mail:

Published Online: 2014-07-03

Published in Print: 2014-08-01

Citation Information: Statistical Applications in Genetics and Molecular Biology, Volume 13, Issue 4, Pages 423–434, ISSN (Online) 1544-6115, ISSN (Print) 2194-6302, DOI: https://doi.org/10.1515/sagmb-2013-0025.

Export Citation

© 2014 by De Gruyter.Get Permission

Supplementary Article Materials

Citing Articles

Here you can find all Crossref-listed publications in which this article is cited. If you would like to receive automatic email messages as soon as this article is cited in other publications, simply activate the “Citation Alert” on the top of this page.

Vasudevan Prabhakaran, Douglas A. Drevets, Govindan Ramajayam, Josephine J. Manoj, Michael P. Anderson, Jay S. Hanas, Vedantam Rajshekhar, Anna Oommen, Hélène Carabin, and Fela Mendlovic
PLOS Neglected Tropical Diseases, 2017, Volume 11, Number 6, Page e0005664
Xiaoxia Liu, Hui Zhu, Rongxing Lu, and Hui Li
Peer-to-Peer Networking and Applications, 2016

Comments (0)

Please log in or register to comment.
Log in