Jump to ContentJump to Main Navigation
Show Summary Details
In This Section

The Prague Bulletin of Mathematical Linguistics

The Journal of Charles University

2 Issues per year

Open Access
See all formats and pricing
In This Section

Improving Machine Translation through Linked Data

Ankit Srivastava
  • Corresponding author
  • German Research Center for Artificial Intelligence (DFKI), Language Technology Lab, Berlin, Germany
  • Email:
/ Georg Rehm
  • German Research Center for Artificial Intelligence (DFKI), Language Technology Lab, Berlin, Germany
/ Felix Sasaki
  • German Research Center for Artificial Intelligence (DFKI), Language Technology Lab, Berlin, Germany
Published Online: 2017-06-06 | DOI: https://doi.org/10.1515/pralin-2017-0033


With the ever increasing availability of linked multilingual lexical resources, there is a renewed interest in extending Natural Language Processing (NLP) applications so that they can make use of the vast set of lexical knowledge bases available in the Semantic Web. In the case of Machine Translation, MT systems can potentially benefit from such a resource. Unknown words and ambiguous translations are among the most common sources of error. In this paper, we attempt to minimise these types of errors by interfacing Statistical Machine Translation (SMT) models with Linked Open Data (LOD) resources such as DBpedia and BabelNet. We perform several experiments based on the SMT system Moses and evaluate multiple strategies for exploiting knowledge from multilingual linked data in automatically translating named entities. We conclude with an analysis of best practices for multilingual linked data sets in order to optimise their benefit to multilingual and cross-lingual applications.


  • Arcan, Mihael, Marco Turchi, Sara Tonelli, and Paul Buitelaar. Enhancing Statistical Machine Translation with bilingual terminology in a CAT environment. In 11th Conference of the Association for Machine Translation in the Americas, pages 54–68, 2014.Google Scholar

  • Bojar, Ondrej, Christian Buck, Rajen Chatterjee, Christian Federmann, Liane Guillou, Barry Haddow, Matthias Huck, Antonio Jimeno Yepes, Auralie Navaol, Mariana Neves, Pavel Pecina, Martin Popel, Philipp Koehn, Christof Monz, Matteo Negri, Matt Post, Lucia Specia, Karin Verspoor, Jorg Tiedemann, and Marco Turchi, editors. Proceedings of the First Conference on Machine Translation. Association for Computational Linguistics, Berlin, Germany, August 2016. URL http://www.aclweb.org/anthology/W/W16/W16-2200.

  • Bouamor, Dhouha, Nasredine Semmar, and Pierre Zweigenbaum. Identifying bilingual Multi-Word Expressions for Statistical Machine Translation. In Calzolari, Nicoletta, Khalid Choukri, Thierry Declerck, Mehmet Ugur Dogan, Bente Maegaard, Joseph Mariani, Jan Odijk, and Stelios Piperidis, editors, Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC-2012), pages 674–679, Istanbul, Turkey, May 2012. European Language Resources Association (ELRA). ISBN 978-2-9517408-7-7. URL http://www.lrec-conf.org/proceedings/lrec2012/pdf/886_Paper.pdf. ACL Anthology Identifier: L12-1527.

  • Carpuat, Marine Jacinthe. Word Sense Disambiguation for Statistical Machine Translation. PhD thesis, 2008. AAI3350676.Google Scholar

  • Du, Jinhua, Andy Way, and Andrzej Zydron. Using BabelNet to Improve OOV Coverage in SMT. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), 2016.Google Scholar

  • Fiorelli, Manuel, Armando Stellato, John P. Mccrae, Philipp Cimiano, and Maria Teresa Pazienza. LIME: The Metadata Module for OntoLex. In Proceedings of the 12th European Semantic Web Conference on The Semantic Web. Latest Advances and New Domains - Volume 9088, pages 321–336, New York, NY, USA, 2015. Springer-Verlag New York, Inc. ISBN 978-3-319-18817-1. doi: 10.1007/978-3-319-18818-8_20. URL http://dx.doi.org/10.1007/978-3-319-18818-8_20.Crossref

  • Hokamp, Chris. Leveraging NLP Technologies and Linked Open Data to Create Better CAT Tools. In International Journal of Localisation, Vol 14, pages 14–18, 2014.Google Scholar

  • Hutchins, John. John W. Hutchins (Eds.), Early Years in Machine Translation, chapter The first decades of Machine Translation: overview, chronology, sources, pages 1–16. John Benjamins B. V., 2000.Google Scholar

  • Koehn, Philipp, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, Chris Dyer, Ondrej Bojar, Alexandra Constantin, and Evan Herbst. Moses: Open Source Toolkit for Statistical Machine Translation. In Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions, ACL ’07, pages 177–180, Stroudsburg, PA, USA, 2007. Association for Computational Linguistics. URL http://dl.acm.org/citation.cfm?id=1557769.1557821.

  • McCrae, John and Philipp Cimiano. Mining Translations from the Web of Open Linked Data. In Proceedings of the Joint Workshop on NLP, LOD and SWAIE, pages 8–11, 2013.Google Scholar

  • Navigli, Roberto and Simone Paolo Ponzetto. BabelNet: The Automatic Construction, Evaluation and Application of a Wide-Coverage Multilingual Semantic Network. In Artificial Intelligence, pages 217–250, 2012.Google Scholar

  • Nebhi, Kamel, Luka Nerima, and Eric Wehrli. NERTIS - A Machine Translation Mashup System using Wikimeta and DBpedia. In Semantic Web (ESWC) 2013 Satellite Events, pages 312–318, 2013.Google Scholar

  • Papineni, Kishore, Salim Roukos, Todd Ward, and Wei-Jung Zhu. BLEU: A Method for Automatic Evaluation of Machine Translation. In 40th Annual Meeting of the Association for Computational Linguistics, pages 311–318, 2002.Google Scholar

  • Sennrich, Rico, Barry Haddow, and Alexandra Birch. Edinburgh Neural Machine Translation Systems for WMT 16. In Proceedings of the First Conference on Machine Translation, pages 371–376, Berlin, Germany, August 2016. Association for Computational Linguistics. URL http://www.aclweb.org/anthology/W/W16/W16-2323.

  • Snover, Matthew, Bonnie Dorr, Richard Schwartz, Linnea Micciula, and John Makhoul. A Study of Translation Edit Rate with targeted Human Annotation. In 7th Conference of the Association for Machine Translation in the Americas, pages 223–231, 2006.Google Scholar

  • Srivastava, Ankit, F. Sasaki, P. Bourgonje, J. Moreno-Schneider, J. Nehring, and G. Rehm. How To Configure Statistical Machine Translation with Linked Open Data Resources. In Proceedings of the 38th Annual Translating and Computer Conference, TC 38, 2016.Google Scholar

  • Steinberger, Ralf, Bruno Pouliquen, Mijail Kabadjov, Jenya Belyaeva, and Erik van der Goot. JRC-NAMES: A Freely Available, Highly Multilingual Named Entity Resource. In Proceedings of the International Conference Recent Advances in Natural Language Processing 2011, pages 104–110. Association for Computational Linguistics, 2011. URL http://aclweb.org/anthology/R11-1015.

About the article

Published Online: 2017-06-06

Published in Print: 2017-06-01

Citation Information: The Prague Bulletin of Mathematical Linguistics, ISSN (Online) 1804-0462, DOI: https://doi.org/10.1515/pralin-2017-0033.

Export Citation

© 2017 Ankit Srivastava et al., published by De Gruyter Open. This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License. BY-NC-ND 3.0

Comments (0)

Please log in or register to comment.
Log in