Jump to ContentJump to Main Navigation
Show Summary Details
More options …

The Prague Bulletin of Mathematical Linguistics

The Journal of Charles University

2 Issues per year

Open Access
Online
ISSN
1804-0462
See all formats and pricing
More options …

Scalable Reordering Models for SMT based on Multiclass SVM

Abdullah Alrajeh
  • Corresponding author
  • School of Electronics and Computer Science, University of Southampton/Computer Research Institute, King Abdulaziz City for Science and Technology (KACST)
  • Email
  • Other articles by this author:
  • De Gruyter OnlineGoogle Scholar
/ Mahesan Niranjan
Published Online: 2015-04-18 | DOI: https://doi.org/10.1515/pralin-2015-0004

Abstract

In state-of-the-art phrase-based statistical machine translation systems, modelling phrase reorderings is an important need to enhance naturalness of the translated outputs, particularly when the grammatical structures of the language pairs differ significantly. Posing phrase movements as a classification problem, we exploit recent developments in solving large-scale multiclass support vector machines. Using dual coordinate descent methods for learning, we provide a mechanism to shrink the amount of training data required for each iteration. Hence, we produce significant computational saving while preserving the accuracy of the models. Our approach is a couple of times faster than maximum entropy approach and more memory-efficient (50% reduction). Experiments were carried out on an Arabic-English corpus with more than a quarter of a billion words. We achieve BLEU score improvements on top of a strong baseline system with sparse reordering features.

References

  • Al-Onaizan, Yaser and Kishore Papineni. Distortion models for statistical machine translation. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, pages 529-536, Sydney, Australia, July 2006. Association for Computational Linguistics. URL http://www.aclweb.org/ anthology/P/P06/P06-1067.Google Scholar

  • Alrajeh, Abdullah and Mahesan Niranjan. Large-scale reordering model for statistical machine translation using dual multinomial logistic regression. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1758-1763. Association for Computational Linguistics, 2014a. URL http://aclweb.org/anthology/D14-1183.Google Scholar

  • Alrajeh, Abdullah and Mahesan Niranjan. Bayesian reordering model with feature selection. In Proceedings of the Ninth Workshop on Statistical Machine Translation, pages 477-485, Baltimore, Maryland, USA, June 2014b. Association for Computational Linguistics. URL http://www. aclweb.org/anthology/W/W14/W14-3361.Google Scholar

  • Alrajeh, Abdullah, Akiko Takeda, and Mahesan Niranjan. Memory-efficient large-scale linear support vector machine. In Proceedings of SPIE: Seventh International Conference on Machine Vision (ICMV 2014), volume 9445, pages 944527-944527-6, Milan, Italy, February 2015. SPIE. doi: 10.1117/12.2180925. URL http://dx.doi.org/10.1117/12.2180925.CrossrefGoogle Scholar

  • Andrew, Galen and Jianfeng Gao. Scalable training of L1-regularized log-linear models. In Proceedings of the 24th International Conference on Machine Learning, ICML ’07, pages 33-40. ACM, 2007. ISBN 978-1-59593-793-3.Google Scholar

  • Bishop, Christopher M. Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Verlag New York, Inc., Secaucus, NJ, USA, 2006.Google Scholar

  • Boser, Bernhard E., Isabelle M. Guyon, and Vladimir N. Vapnik. A training algorithm for optimal margin classifiers. In Proceedings of the Fifth Annual Workshop on Computational Learning Theory, COLT ’92, pages 144-152, 1992.Google Scholar

  • Brown, P., J. Cocke, S. Della Pietra, V. Della Pietra, F. Jelinek, R. Mercer, and P. Roossin. A statistical approach to language translation. In 12th International Conference on Computational Linguistics (COLING), pages 71-76, 1988.Google Scholar

  • Brown, Peter F., John Cocke, Stephen A. Della-Pietra, Vincent J. Della-Pietra, Frederick Jelinek, Robert L. Mercer, and Paul Rossin. The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics, 19(2):263-311, 1993.Web of ScienceGoogle Scholar

  • Chang, Yin-Wen, Cho-Jui Hsieh, Kai-Wei Chang, Michael Ringgaard, and Chih-Jen Lin. Training and testing low-degree polynomial data mappings via linear SVM. Journal of Machine Learning Research, 2:1471-1490, Apr. 2010.Google Scholar

  • Cherry, Colin. Improved reordering for phrase-based translation using sparse features. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 22-31, Atlanta, Georgia, June 2013. Association for Computational Linguistics. URL http://www.aclweb.org/anthology/N13-1003.Google Scholar

  • Crammer, Koby and Yoram Singer. On the algorithmic implementation of multiclass kernelbased vector machines. Journal of Machine Learning Research, 2:265-292, Mar. 2002.Google Scholar

  • Cristianini, Nello and John Shawe-Taylor. An Introduction to Support Vector Machines: And Other Kernel-based Learning Methods. Cambridge University Press, New York, NY, USA, 2000. ISBN 0-521-78019-5.Google Scholar

  • Doddington, George. Automatic evaluation of machine translation quality using n-gram cooccurrence statistics. In Proceedings of the Second International Conference on Human Language Technology Research, HLT ’02, pages 138-145, San Francisco, CA, USA, 2002. Morgan Kaufmann Publishers Inc.Google Scholar

  • Eisele, A. and Y. Chen. MultiUN: A multilingual corpus from united nation documents. In Tapias, Daniel, Mike Rosner, Stelios Piperidis, Jan Odjik, Joseph Mariani, Bente Maegaard, Khalid Choukri, and Nicoletta Calzolari (Conference Chair), editors, Proceedings of the Seventh conference on International Language Resources and Evaluation, pages 2868-2872. European Language Resources Association (ELRA), 5 2010.Google Scholar

  • Fan, Rong-En, Kai-Wei Chang, Cho-Jui Hsieh, Xiang-Rui Wang, and Chih-Jen Lin. LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research, 9:1871-1874, 2008.Google Scholar

  • Galley, Michel and Christopher D. Manning. A simple and effective hierarchical phrase reordering model. In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, pages 848-856, Hawaii, October 2008. Association for Computational Linguistics. Glasmachers, Tobias and Ürün Dogan. Accelerated coordinate descent with adaptive coordinate frequencies. In Ong, Cheng Soon and Tu Bao Ho, editors, Asian Conference on Machine Learning, ACML, volume 29 of JMLR Proceedings, pages 72-86. JMLR.org, 2013.Google Scholar

  • Hopkins, Mark and Jonathan May. Tuning as ranking. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pages 1352-1362, Edinburgh, Scotland, UK., July 2011. Association for Computational Linguistics. URL http://www.aclweb.org/ anthology/D11-1125.Google Scholar

  • Hsieh, Cho-Jui, Kai-Wei Chang, Chih-Jen Lin, S. Sathiya Keerthi, and S. Sundararajan. A dual coordinate descent method for large-scale linear SVM. In Proceedings of the 25th International Conference on Machine Learning, ICML ’08, pages 408-415, 2008.Google Scholar

  • Keerthi, S. Sathiya, Sellamanickam Sundararajan, Kai-Wei Chang, Cho-Jui Hsieh, and Chih-Jen Lin. A sequential dual method for large scale multi-class linear SVMs. In Proceedings of the Forteenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 408-416, 2008. URL http://www.csie.ntu.edu.tw/~cjlin/papers/sdm_kdd.pdf.Google Scholar

  • Kneser, Reinhard and Hermann Ney. Improved backing-off for m-gram language modeling. IEEE International Conference on Acoustics, Speech and Signal Processing, pages 181-184, 1995.Google Scholar

  • Koehn, Philipp. Pharaoh: a beam search decoder for phrase-based statistical machine translation models. In Proceedings of 6th Conference of the Association for Machine Translation in the Americas (AMTA), pages 115-124, Washington DC, 2004a.Google Scholar

  • Koehn, Philipp. Statistical significance tests for machine translation evaluation. In Lin, Dekang and Dekai Wu, editors, Proceedings of EMNLP 2004, pages 388-395, Barcelona, Spain, July 2004b. Association for Computational Linguistics.Google Scholar

  • Koehn, Philipp. Statistical Machine Translation. Cambridge University Press, 2010.Google Scholar

  • Koehn, Philipp and Christof Monz. Shared task: Statistical machine translation between European languages. In Proceedings of ACL Workshop on Building and Using Parallel Texts, pages 119-124. Association for Computational Linguistics, 2005.Google Scholar

  • Koehn, Philipp, Amittai Axelrod, Alexandra Birch Mayne, Chris Callison-Burch, Miles Osborne, and David Talbot. Edinburgh system description for the 2005 IWSLT speech translation evaluation. In Proceedings of International Workshop on Spoken Language Translation, Pittsburgh, PA, 2005.Google Scholar

  • Koehn, Philipp, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, Christopher J. Dyer, Ondřej Bojar, Alexandra Constantin, and Evan Herbst. Moses: Open source toolkit for statistical machine translation. In Proceedings of the ACL 2007 Demo and Poster Sessions, pages 177-180, 2007.Google Scholar

  • Kumar, Shankar and William Byrne. Local phrase reordering models for statistical machine translation. In Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, pages 161-168, Vancouver, British Columbia, Canada, October 2005. Association for Computational Linguistics.Google Scholar

  • Nguyen, Vinh Van, Akira Shimazu, Minh Le Nguyen, and Thai Phuong Nguyen. Improving a lexicalized hierarchical reordering model using maximum entropy. In Proceedings of the Twelfth Machine Translation Summit (MT Summit XII). International Association for Machine Translation, 2009. Ni, Y., C. Saunders, S. Szedmak, and M. Niranjan. Exploitation of machine learning techniques in modelling phrase movements for machine translation. Journal of Machine Learning Research, 12:1-30, Feb. 2011. ISSN 1532-4435.Google Scholar

  • Och, Franz Josef and Hermann Ney. Improved statistical alignment models. In Proceedings of the 38th Annual Meeting of the Association of Computational Linguistics (ACL), 2000.Google Scholar

  • Och, Franz Josef and Hermann Ney. Discriminative training and maximum entropy models for statistical machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), 2002.Google Scholar

  • Och, Franz Josef and Hermann Ney. The alignment template approach to statistical machine translation. Computational Linguistics, 30(4):417-449, 2004.Google Scholar

  • Papineni, K.A., S. Roukos, and R.T. Ward. Maximum likelihood and discriminative training of direct translation models. In Proceedings of ICASSP, pages 189-192, 1998.Google Scholar

  • Papineni, K., S. Roukos, T. Ward, and W. Zhu. BLEU: a method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pages 311-318, Stroudsburg, PA, USA, 2002. ACL.Google Scholar

  • Shawe-Taylor, John and Nello Cristianini. Kernel Methods for Pattern Analysis. Cambridge University Press, New York, NY, USA, 2004.Google Scholar

  • Tillmann, Christoph. A unigram orientation model for statistical machine translation. In Proceedings of HLT-NAACL: Short Papers, pages 101-104, 2004.Google Scholar

  • Xiang, Bing, Niyu Ge, and Abraham Ittycheriah. Improving reordering for statistical machine translation with smoothed priors and syntactic features. In Proceedings of SSST-5, Fifth Workshop on Syntax, Semantics and Structure in Statistical Translation, pages 61-69, Portland, Oregon, USA, 2011. Association for Computational Linguistics.Google Scholar

  • Xiong, Deyi, Qun Liu, and Shouxun Lin. Maximum entropy based phrase reordering model for statistical machine translation. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the ACL, pages 521-528, Sydney, July 2006. Association for Computational Linguistics.Google Scholar

  • Zens, Richard and Hermann Ney. Discriminative reordering models for statistical machine translation. In Proceedings on the Workshop on Statistical Machine Translation, pages 55-63, New York City, June 2006. Association for Computational Linguistics. Google Scholar

About the article

Published Online: 2015-04-18

Published in Print: 2015-04-01


Citation Information: The Prague Bulletin of Mathematical Linguistics, ISSN (Online) 1804-0462, DOI: https://doi.org/10.1515/pralin-2015-0004.

Export Citation

© by Abdullah Alrajeh. This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License. BY-NC-ND 3.0

Citing Articles

Here you can find all Crossref-listed publications in which this article is cited. If you would like to receive automatic email messages as soon as this article is cited in other publications, simply activate the “Citation Alert” on the top of this page.

[1]
Arefeh Kazemi, Antonio Toral, Andy Way, Amirhassan Monadjemi, and Mohammadali Nematbakhsh
Entropy, 2017, Volume 19, Number 9, Page 340

Comments (0)

Please log in or register to comment.
Log in