Skip to content
Licensed Unlicensed Requires Authentication Published by De Gruyter August 11, 2018

Tensor Train Spectral Method for Learning of Hidden Markov Models (HMM)

Maxim A. Kuznetsov and Ivan V. Oseledets


We propose a new algorithm for spectral learning of Hidden Markov Models (HMM). In contrast to the standard approach, we do not estimate the parameters of the HMM directly, but construct an estimate for the joint probability distribution. The idea is based on the representation of a joint probability distribution as an N-th-order tensor with low ranks represented in the tensor train (TT) format. Using TT-format, we get an approximation by minimizing the Frobenius distance between the empirical joint probability distribution and tensors with low TT-ranks with core tensors normalization constraints. We propose an algorithm for the solution of the optimization problem that is based on the alternating least squares (ALS) approach and develop its fast version for sparse tensors. The order of the tensor d is a parameter of our algorithm. We have compared the performance of our algorithm with the existing algorithm by Hsu, Kakade and Zhang proposed in 2009 and found that it is much more robust if the number of hidden states is overestimated.

MSC 2010: 15A69; 65C40; 60J20

Funding statement: The authors gratefully acknowledge the financial support from Ministry of Education and Science of the Russian Federation under grant 14.756.31.0001.


The authors express their deep gratitude to Professor Andrzej Cichocki for his helpful comments and assistance.


[1] A. Anandkumar, R. Ge, D. Hsu, S. M. Kakade and M. Telgarsky, Tensor decompositions for learning latent variable models, J. Mach. Learn. Res. 15 (2014), 2773–2832. 10.21236/ADA604494Search in Google Scholar

[2] B. W. Bader and T. G. Kolda, Efficient MATLAB computations with sparse and factored tensors, SIAM J. Sci. Comput. 30 (2007/08), no. 1, 205–231. 10.2172/897641Search in Google Scholar

[3] L. E. Baum, T. Petrie, G. Soules and N. Weiss, A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains, Ann. Math. Statist. 41 (1970), 164–171. 10.1214/aoms/1177697196Search in Google Scholar

[4] J. D. Caroll and J. J. Chang, Analysis of individual differences in multidimensional scaling via n-way generalization of Eckart–Young decomposition, Psychometrika 35 (1970), 283–319. 10.1007/BF02310791Search in Google Scholar

[5] V. de Silva and L.-H. Lim, Tensor rank and the ill-posedness of the best low-rank approximation problem, SIAM J. Matrix Anal. Appl. 30 (2008), no. 3, 1084–1127. 10.1137/06066518XSearch in Google Scholar

[6] L. Grasedyck, Hierarchical singular value decomposition of tensors, SIAM J. Matrix Anal. Appl. 31 (2009/10), no. 4, 2029–2054. 10.1137/090764189Search in Google Scholar

[7] W. Hackbusch and S. Kühn, A new scheme for the tensor representation, J. Fourier Anal. Appl. 15 (2009), no. 5, 706–722. 10.1007/s00041-009-9094-9Search in Google Scholar

[8] D. Hsu, S. M. Kakade and T. Zhang, A spectral algorithm for learning hidden Markov models, J. Comput. System Sci. 78 (2012), no. 5, 1460–1480. 10.1016/j.jcss.2011.12.025Search in Google Scholar

[9] X. D. Huang, Y. Ariki and M. A. Jack, Hidden Markov Models for Speech Recognition, Edinburgh University, Edinburgh, 1990. Search in Google Scholar

[10] H. Jaeger, Observable operator models for discrete stochastic time series, Neural Comput. 12 (2000), no. 6, 1371–1398. 10.1162/089976600300015411Search in Google Scholar PubMed

[11] T. G. Kolda and B. W. Bader, Tensor decompositions and applications, SIAM Rev. 51 (2009), no. 3, 455–500. 10.1137/07070111XSearch in Google Scholar

[12] A. Krogh, B. Larsson, G. Von Heijne and E. L. L. Sonnhammer, Predicting transmembrane protein topology with a hidden Markov model: Application to complete genomes, J. Mol. Biol. 305 (2001), no. 3, 567–580. 10.1006/jmbi.2000.4315Search in Google Scholar PubMed

[13] I. V. Oseledets, Tensor-train decomposition, SIAM J. Sci. Comput. 33 (2011), no. 5, 2295–2317. 10.1137/090752286Search in Google Scholar

[14] I. Oseledets, M. Rakhuba and A. Uschmajew, Alternating least squares as moving subspace correction, preprint (2017), 10.1137/17M1148712Search in Google Scholar

[15] I. V. Oseledets and E. E. Tyrtyshnikov, Breaking the curse of dimensionality, or how to use SVD in many dimensions, SIAM J. Sci. Comput. 31 (2009), no. 5, 3744–3759. 10.1137/090748330Search in Google Scholar

[16] L. Rabiner, A tutorial on hidden Markov models and selected applications in speech recognition, Proc. IEEE 77 (1989), no. 2, 257–286. 10.1016/B978-0-08-051584-7.50027-9Search in Google Scholar

[17] T. Rohwedder and A. Uschmajew, On local convergence of alternating schemes for optimization of convex problems in the tensor train format, SIAM J. Numer. Anal. 51 (2013), no. 2, 1134–1162. 10.1137/110857520Search in Google Scholar

[18] S. M. Siddiqi, B. Boots and G. J. Gordon, Reduced-rank hidden Markov models, International Conference on Artificial Intelligence and Statistics, PMLR, (2010), 741–748. Search in Google Scholar

[19] L. Song, M. Ishteva, A. Parikh, E. Xing and H. Park, Hierarchical tensor decomposition of latent tree graphical models, Proceedings of the 30th International Conference on Machine Learning (ICML-13), PMLR (2013), 334–342. Search in Google Scholar

[20] L. Song, E. P. Xing and A. P. Parikh, A spectral algorithm for latent tree graphical models, Proceedings of the 28th International Conference on Machine Learning (ICML-11), PMLR (2011), 1065–1072. Search in Google Scholar

[21] K. Stratos, M. Collins and D. Hsu, Unsupervised part-of-speech tagging with anchor hidden Markov models, Trans. Assoc. Comput. Linguist. 4 (2016), 245–257. 10.1162/tacl_a_00096Search in Google Scholar

Received: 2017-10-15
Revised: 2018-02-26
Accepted: 2018-05-02
Published Online: 2018-08-11
Published in Print: 2019-01-01

© 2018 Walter de Gruyter GmbH, Berlin/Boston