Jump to ContentJump to Main Navigation
Show Summary Details
In This Section

Statistical Applications in Genetics and Molecular Biology

Editor-in-Chief: Stumpf, Michael P.H.

6 Issues per year

IMPACT FACTOR 2016: 0.646
5-year IMPACT FACTOR: 1.191

CiteScore 2016: 0.94

SCImago Journal Rank (SJR) 2015: 0.954
Source Normalized Impact per Paper (SNIP) 2015: 0.554

Mathematical Citation Quotient (MCQ) 2015: 0.06

See all formats and pricing
In This Section
Volume 11, Issue 1 (Jan 2012)


Improving Hidden Markov Models for Classification of Human Immunodeficiency Virus-1 Subtypes through Linear Classifier Learning

Ingo Bulla
  • University of Greifswald
/ Anne-Kathrin Schultz
  • University of Göttingen
/ Peter Meinicke
  • University of Göttingen
Published Online: 2012-01-06 | DOI: https://doi.org/10.2202/1544-6115.1680

Profile Hidden Markov Models (pHMMs) are widely used to model nucleotide or protein sequence families. In many applications, a sequence family classified into several subfamilies is given and each subfamily is modeled separately by one pHMM. A major drawback of this approach is the difficulty of coping with subfamilies composed of very few sequences.Correct subtyping of human immunodeficiency virus-1 (HIV-1) sequences is one of the most crucial bioinformatic tasks affected by this problem of small subfamilies, i.e., HIV-1 subtypes with a small number of known sequences. To deal with small samples for particular subfamilies of HIV-1, we employ a machine learning approach. More precisely, we make use of an existing HMM architecture and its associated inference engine, while replacing the unsupervised estimation of emission probabilities by a supervised method. For that purpose, we use regularized linear discriminant learning together with a balancing scheme to account for the widely varying sample size. After training the multiclass linear discriminants, the corresponding weights are transformed to valid probabilities using a softmax function.We apply this modified algorithm to classify HIV-1 sequence data (in the form of partial-length HIV-1 sequences and semi-artificial recombinants) and show that the performance of pHMMs can be significantly improved by the proposed technique.

Keywords: HMMs; linear classifier learning; HIV-1

About the article

Published Online: 2012-01-06

Citation Information: Statistical Applications in Genetics and Molecular Biology, ISSN (Online) 1544-6115, DOI: https://doi.org/10.2202/1544-6115.1680. Export Citation

Comments (0)

Please log in or register to comment.
Log in