Jump to ContentJump to Main Navigation
Show Summary Details
More options …

Statistical Applications in Genetics and Molecular Biology

Editor-in-Chief: Sanguinetti, Guido

IMPACT FACTOR 2018: 0.536
5-year IMPACT FACTOR: 0.764

CiteScore 2018: 0.49

SCImago Journal Rank (SJR) 2018: 0.316
Source Normalized Impact per Paper (SNIP) 2018: 0.342

Mathematical Citation Quotient (MCQ) 2017: 0.04

See all formats and pricing
More options …
Volume 14, Issue 1


Volume 10 (2011)

Volume 9 (2010)

Volume 6 (2007)

Volume 5 (2006)

Volume 4 (2005)

Volume 2 (2003)

Volume 1 (2002)

A hidden Markov-model for gene mapping based on whole-genome next generation sequencing data

Jürgen Claesen
  • Corresponding author
  • Interuniversity Institute of Biostatistics and Statistical Bioinformatics, Hasselt University, Martelarenlaan 42, 3500 Hasselt, Belgium
  • Email
  • Other articles by this author:
  • De Gruyter OnlineGoogle Scholar
/ Tomasz Burzykowski
Published Online: 2014-12-05 | DOI: https://doi.org/10.1515/sagmb-2014-0007


The analysis of polygenic, phenotypic characteristics such as quantitative traits or inheritable diseases requires reliable scoring of many genetic markers covering the entire genome. The advent of high-throughput sequencing technologies provides a new way to evaluate large numbers of single nucleotide polymorphisms as genetic markers. Combining the technologies with pooling of segregants, as performed in bulk segregant analysis, should, in principle, allow the simultaneous mapping of multiple genetic loci present throughout the genome. We propose a hidden Markov-model to analyze the marker data obtained by the bulk segregant next generation sequencing. The model includes several states, each associated with a different probability of observing the same/different nucleotide in an offspring as compared to the parent. The transitions between the molecular markers imply transitions between the states of the model. After estimating the transition probabilities and state-related probabilities of nucleotide (dis)similarity, the most probable state for each SNP is selected. The most probable states can then be used to indicate which genomic regions may be likely to contain trait-related genes. The application of the model is illustrated on the data from a study of ethanol tolerance in yeast. Software is written in R. R-functions, R-scripts and documentation are available on www.ibiostat.be/software/bioinformatics.

This article offers supplementary material which is provided at the end of the article.

Keywords: bulk segregant analysis; hidden Markov-models; next generation sequencing


  • Baum, L. E., T. Petrie, G. Soules and N. Weiss (1970): “A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains,” Ann. Math. Stat., 41, 164–171.CrossrefGoogle Scholar

  • Bullard, J., E. Purdom, K. Hansen and S. Dudoit (2010): “Evaluation of statistical methods for normailzation and differential expression in mRNA-Seq experiments,” BMC Bioinformatics, 11, 94.CrossrefPubMedWeb of ScienceGoogle Scholar

  • Bravo, H. C. and R. A. Irizarry (2010): “Model-based quality assessment and base-calling for second generation sequencing data,” Biometrics, 66, 665–674.Web of SciencePubMedCrossrefGoogle Scholar

  • Claesen, J., L. Clement, Z. Shkedy, M. R. Foulquie-Moreno and T. Burzykowski (2013): “Simultaneous mapping of multiple gene loci with pooled segregants,” PLoS ONE, 8, e55133.Google Scholar

  • Dempster, A. P., N. M. Laird and D. B. Rubin (1977): “Maximum likelihood from incomplete data via the EM algorithm,” J. R. Stat. Soc., B39, 1–38.Google Scholar

  • Dohm, J. C., C. Lottaz, T. Borodina and H. Himmelbauer (2008): “Substantial biases in ultra-short read data sets from high-throughput DNA sequences,” Nucleic Acid Res., 36, e105.Google Scholar

  • Ehrenreich, I. M., N. Torabi, Y. Jia, J. Kent, K. Martis, J. A. Shapiro, D. Gresham, A. A. Caudy and L. Kruglyak (2010): “Dissection of genetically compex traits with extremely large pools of yeast segregants,” Nature, 464, 1039–1042.CrossrefWeb of ScienceGoogle Scholar

  • Edwards, M. D. and D. K. Gifford (2012): “High-resolution genetic mapping with pooled sequencing,” BMC Bioinformatics, 13, S8.Web of ScienceGoogle Scholar

  • Louis, T. A. (1982): “Finding the observed information matrix when using the EM algorithm,” J. R. Stat. Soc., B44, 226–233.Google Scholar

  • Magwene, P. M., P. H. Willis and J. K. Kelly (2011): “The statistics of bulk segregant analysis using next generation sequencing,” PLoS Comput. Biol., 7, e1002255.Web of ScienceCrossrefGoogle Scholar

  • McPeek, M. S. and A. Strahs (1999): “Assessment of linkage disequilibrium by the decay of haplotype sharing, with application to fine-scale genetic mapping,” Am. J. Hum. Genet., 65, 858–875.CrossrefGoogle Scholar

  • Meachem, F., D. Boffelli, J. Dhahbi, D. I. K. Martin, M. Singer and L. Pachter (2011): “Identification and correction of systematic error in high-throughput sequence data,” BMC Bioinformatics, 12, 451.CrossrefWeb of ScienceGoogle Scholar

  • Morris, A. P., J. C. Whittaker and D. J. Balding (2000): “Bayesian fine-scale mapping of disease loci, by hidden Markov models,” Am. J. Hum. Genet., 67, 155–169.CrossrefGoogle Scholar

  • Parts, L., J. Cubillos, J. Warringer, K. Jain, F. Salinas, S. J. Bumpstead, M. Molin, A. Zia, J. T. Simpson, M. A. Quail, A. Moses, E. J. Louis, R. Durbin and G. Liti (2011): “Revealing the genetic structure of a trait by sequencing a population under selection,” Genome Res., 21, 1131–1138.PubMedWeb of ScienceCrossrefGoogle Scholar

  • Rabiner, L. R. and B. H. Juang (1986): “An introduction to hidden markov models,” IEEE Acoust., Speech & Signal Proc. Mag., 3, 4–16.Google Scholar

  • Rougement, J., A. Amzalla, C. Iseli, L. Farinelli, I. Xenarios and F. Naef (2008): “Probabilistic base calling of Solexa sequencing data,” BMC Bioinformatics, 9, 431.CrossrefWeb of ScienceGoogle Scholar

  • Schneeberger, K., S. Ossowski, C. Lanz, T. Juul, A. H. Petersen, K. L. Nielsen, J. E. Jorgensen, D. Weigel and S. U. Andersen (2009): “SHOREmap: simultaneous mapping and mutation identification by deep sequencing,” Nat. Methods, 6, 550–551.Web of ScienceGoogle Scholar

  • Shendure, J. and H. Ji (2008): “Next-generation DNA sequencing,” Nat. Biotechnol., 26, 1135–1145.Google Scholar

  • Swindell, S. R. and T. N. Plasterer (1997): “SEQMAN, contig assembly,” Sequence Data Analysis Guidebook, 75–89. Humana Press, Totowa, New Jersey.Google Scholar

  • Swinnen, S., K. Schaerlaekens, T. Pais, J. Claesen, G. Hubman, G. Yang, M. Demeke, M. Foulquie-Moreno, A. Goovaerts, K. Souvereyns, L. Clement, F. Dumortier and J. M. Thevelein (2012): “Identification of novel causative genes determining the complex trait of high ethanol tolerance in yeast using pooled-segregant whole-genome sequence analysis,” Genome Res., 22, 975–984.CrossrefPubMedWeb of ScienceGoogle Scholar

  • Viterbi, A. J. (1967): “Error bounds for convolutional codes and an asymptotically optimum decoding algorithm,” IEEE T. Inform. Theory, 13, 260–269.CrossrefGoogle Scholar

  • Wang, X. V., N. Blades, J. Ding, R. Sultana and G. Parmigiani (2012): “Estimation of sequencing error rates in short reads,” BMC Bioinformatics, 13, 185.Web of SciencePubMedGoogle Scholar

  • Zucchini, W. and I. L. MacDonald (2009): Hidden markov models for time series. An introduction using R, CRC Press.Google Scholar

About the article

Corresponding author: Jürgen Claesen, Interuniversity Institute of Biostatistics and Statistical Bioinformatics, Hasselt University, Martelarenlaan 42, 3500 Hasselt, Belgium, e-mail:

Published Online: 2014-12-05

Published in Print: 2015-02-01

Citation Information: Statistical Applications in Genetics and Molecular Biology, Volume 14, Issue 1, Pages 21–34, ISSN (Online) 1544-6115, ISSN (Print) 2194-6302, DOI: https://doi.org/10.1515/sagmb-2014-0007.

Export Citation

©2015 by De Gruyter.Get Permission

Supplementary Article Materials

Citing Articles

Here you can find all Crossref-listed publications in which this article is cited. If you would like to receive automatic email messages as soon as this article is cited in other publications, simply activate the “Citation Alert” on the top of this page.

Ratemo Billy Omboki, Yan Zheng, Zhiwei Chen, Huazhong Guan, Weiqi Tang, Likun Huang, Xiaofang Xie, Weiren Wu, and Sang Nag Ahn
Plant Breeding, 2018
Weiqi Tang, Likun Huang, Suhong Bu, Xuzhang Zhang, Weiren Wu, and Oliver Stegle
Bioinformatics, 2018, Volume 34, Number 6, Page 978
Bruna Trindade de Carvalho, Sylvester Holt, Ben Souffriau, Rogelio Lopes Brandão, Maria R. Foulquié-Moreno, Johan M. Thevelein, and Fred M. Winston
mBio, 2017, Volume 8, Number 6, Page e01173-17
Fatemeh Zamanzad Ghavidel, Jürgen Claesen, and Tomasz Burzykowski
Journal of Computational Biology, 2015, Volume 22, Number 2, Page 178

Comments (0)

Please log in or register to comment.
Log in