Statistical versus neural machine translation – a case study for a medium size domain-specific bilingual corpus

Krzysztof Jassem 1  and Tomasz Dwojak 1
  • 1 Adam Mickiewicz University, Poznań, Poland
Krzysztof Jassem and Tomasz Dwojak

Abstract

Neural Machine Translation (NMT) has recently achieved promising results for a number of translation pairs. Although the method requires larger volumes of data and more computational power than Statistical Machine Translation (SMT), it is believed to become dominant in near future. In this paper we evaluate SMT and NMT models learned on a domain-specific English-Polish corpus of a moderate size (1,200,000 segments). The experiment shows that both solutions significantly outperform a general-domain online translator. The SMT model achieves a slightly better BLEU score than the NMT model. On the other hand, the process of decoding is noticeably faster in NMT. Human evaluation carried out on a sizeable sample of translations (2,000 pairs) reveals the superiority of the NMT approach, particularly in the aspect of output fluency.

  • Artstein, R. and M. Poesio. 2008. “Inter-coder agreement for computational linguistics”. Computational Linguistics 34 (4). 555–596. <https://doi.org/10.1162/coli.07-034-RS>

    • Crossref
    • Export Citation
  • Bahdanau, D., K. Cho and Y. Bengio. 2014. “Neural Machine Translation by jointly learning to align and translate”. arXiv Preprint arXiv:1409.0473

  • Chen, B. and C. Cherry. 2014. “A systematic comparison of smoothing techniques for sentence-level BLEU”. In Proceedings of the Ninth Workshop on Statistical Machine Translation. 362–367.

  • Cherry, C. and G. Foster. 2012. “Batch tuning strategies for Statistical Machine Translation”. Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies NAACL Hlt ’12. Stroudsburg, PA: Association for Computational Linguistics. 427– 436. <http://dl.acm.org/citation.cfm?id=2382029.2382089>

  • Cho, K., B. van Merriënboer, Ç. Gülçehre, D. Bahdanau, F. Bougares, H. Schwenk and Y. Bengio. 2014. “Learning phrase representations using RNN encoder–decoder for Statistical Machine Translation”. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (Emnlp) Doha: Association for Computational Linguistics. 1724–1734. <http://www.aclweb.org/anthology/D14-1179>

    • Crossref
    • Export Citation
  • Durrani, N., H. Schmid, A.M. Fraser, P. Koehn and H. Schütze. 2015. “The operation sequence model – combining n-gram-based and phrase-based Statistical Machine Translation”. Computational Linguistics 41. 185–214.

    • Crossref
    • Export Citation
  • Dyer, C., V. Chahuneau and N.A. Smith. 2013. “A simple, fast and effective reparameterization of IBM Model 2”. Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies Atlanta, Georgia: Association for Computational Linguistics. 644–648. <http://www.aclweb.org/anthology/N13-1073>

  • Gehring, J., M. Auli, D. Grangier, D. Yarats and Y.N. Dauphin. 2017. “Convolutional Sequence to Sequence Learning”. ArXiv E-Prints May. <http://arxiv.org/abs/1705.03122>

  • Heafield, K. 2011. “KenLM: Faster and smaller language model queries”. Proceedings of the Sixth Workshop on Statistical Machine Translation WMT ’11. Stroudsburg, PA, USA: Association for Computational Linguistics. 187–197. <http://dl.acm.org/citation.cfm?id=2132960.2132986>

  • Hoang, H., T. Dwojak, R. Krislauks, D. Torregrosa and K. Heafield. 2018. “Fast Neural Machine Translation Implementation”. Proceedings of the Nmt 2018 Association for Computational Linguistics.

  • Hochreiter, S. and J. Schmidhuber. 1997. “Long short-term memory”. Neural Computation O. 1735–1780.

    • PubMed
    • Export Citation
  • Junczys-Dowmunt, M. 2012. “Phrasal rank-encoding: Exploiting phrase redundancy and translational relations for phrase table compression”. The Prague Bulletin of Mathematical Linguistics 98. 63–74.

  • Junczys-Dowmunt, M., T. Dwojak and H. Hoang. 2016. “Is Neural Machine Translation ready for deployment? A case study on 30 translation directions”. arXiv Preprint arXiv:1610.01108

  • Junczys-Dowmunt, M., T. Dwojak and R. Sennrich. 2016. “The AMU-UEDIN submission to the WMT16 News Translation Task: Attention-based NMT models as feature functions in phrase-based SMT”. Proceedings of the First Conference on Machine Translation Berlin, Germany: Association for Computational Linguistics. 319–325. <http://www.aclweb.org/anthology/W/W16/W16-2316>

  • Junczys-Dowmunt, M., R. Grundkiewicz, T. Dwojak, H. Hoang, K. Heafield, T. Neckermann, F. Seide, et al. 2018. “Marian: Fast Neural Machine Translation in C++”. arXiv Preprint arXiv: 1804.00344 <https://arxiv.org/abs/1804.00344>

  • Kingma, D. and J. Ba. 2014. “Adam: A method for stochastic optimization”. International Conference on Learning Representations December.

  • Klubicka, F., A. Toral and V.M. Sánchez-Cartagena. 2017. “Fine-grained human evaluation of neural versus phrase-based machine translation”. CoRR abs/1706.04389. <http://arxiv.org/abs/1706.04389>

  • Koehn, P. 2005. “Europarl: A parallel corpus for statistical machine translation”. MT Summit 5. 79–86.

  • Koehn, P. 2009. Statistical Machine Translation Cambridge University Press.

  • Koehn, P. 2017. “Neural Machine Translation”. CoRR abs/1709.07809. <http://arxiv.org/abs/1709.07809>

  • Koehn, P., H. Hoang, A. Birch, C. Callison-Burch, M. Federico, N. Bertoldi, B. Cowan, et al. 2007. “Moses: Open source toolkit for Statistical Machine Translation”. Proceedings of the 45th Annual Meeting of the Acl on Interactive Poster and Demonstration Sessions ACL ’07. Stroudsburg, PA: Association for Computational Linguistics. 177–180. <http://dl.acm.org/citation.cfm?id=1557769.1557821>

  • Koehn, P. and R. Knowles. 2017. “Six challenges for Neural Machine Translation”. Proceedings of the First Workshop on Neural Machine Translation 28–39.

  • Lee, J. K. Cho and T. Hofmann. 2017. “Fully character-level Neural Machine Translation without explicit segmentation”. Transactions of the Association for Computational Linguistics 5. 365–378.

    • Crossref
    • Export Citation
  • Mikolov, T., K. Chen, G. Corrado and J. Dean. 2013. “Efficient estimation of word representations in vector space”. CoRR abs/1301.3781. <http://dblp.uni-trier.de/db/journals/corr/corr1301dhtml#abs-1301-3781>

  • Papineni, K., S. Roukos, T. Ward and W.-J. Zhu. 2002. “BLEU: A method for automatic evaluation of Machine Translation”. Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. ACL ’02. Stroudsburg, PA: Association for Computational Linguistics. 311–318. <https://doi.org/10.3115/1073083.1073135>

  • Sennrich, R. A. Birch, A. Currey, U. Germann, B. Haddow, K. Heafield, A. Valerio Miceli Barone and P. Williams. 2017. “The University of Edinburgh’s Neural MT systems for WMT17”. Proceedings of the Second Conference on Machine Translation, Volume 2: Shared Task Papers Copenhagen: Association for Computational Linguistics. 389–399. <http://www.aclweb.org/anthology/W17-4739>

  • Sennrich, R., B. Haddow and A. Birch. 2016. “Neural Machine Translation of rare words with subword units”. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) Berlin: Association for Computational Linguistics. 1715–1725. <https://doi.org/10.18653/vU/P16-1162>

  • Sutskever, I., O. Vinyals and Q.V. Le. 2014. “Sequence to sequence learning with neural networks”. In: Ghahramani, Z., M. Welling, C. Cortes, N.D. Lawrence and K.Q. Weinberger (eds.), Advances in neural information processing systems 27 Curran Associates, Inc. 3104–12. <http://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks.pdf>

  • Świeczkowska, P. 2017. “Towards a direct Japanese-Polish machine translation system”. Proceedings of the 8th Language & Technology Conference Poznań.

  • Vaswani, A., N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A.N Gomez, Ł. Kaiser and I. Polosukhin. 2017. “Attention is all you need”. In: Guyon, I., U.V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan and R. Garnett (eds.), Advances in neural information processing systems 30 Curran Associates, Inc. 5998–6008. <http://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf>

  • Wołk, K. and K. Marasek. 2015. “PJAIT systems for the IWSLT 2015 evaluation campaign enhanced by comparable corpora”. Proceedings of the International Workshop on Spoken Language Translation December 3–4, 2015 Da Nang, Vietnam.

  • Wołk, K. and K. Marasek. 2016. “PJAIT Systems for the WMT 2016”. Proceedings of the First Conference on Machine Translation

  • Wołk, K. and K. Marasek. 2017. “PJAIT’s Systems for WMT 2017 Conference”. Proceedings of the Second Conference on Machine Translation

Purchase article
Get instant unlimited access to the article.
$42.00
Log in
Already have access? Please log in.


or
Log in with your institution

Journal + Issues

Poznan Studies in Contemporary Linguistics publishes high-quality articles representative of theory-based empirical research in contemporary synchronic linguistics and interdisciplinary studies of language from various perspectives. The journal serves as a forum for modern developments and trends in linguistics, with contributions from the world’s leading linguistic labs.

Search