Jump to ContentJump to Main Navigation
Show Summary Details
More options …

The Prague Bulletin of Mathematical Linguistics

The Journal of Charles University

2 Issues per year

Open Access
See all formats and pricing
More options …

RealText-lex: A Lexicalization Framework for RDF Triples

Rivindu Perera / Parma Nand / Gisela Klette
Published Online: 2016-10-15 | DOI: https://doi.org/10.1515/pralin-2016-0011


The online era has made available almost cosmic amounts of information in the public and semi-restricted domains, prompting development of corresponding host of technologies to organize and navigate this information. One of these developing technologies deals with encoding information from free form natural language into a structured form as RDF triples. This representation enables machine processing of the data, however the processed information can not be directly converted back to human language. This has created a need to be able to lexicalize machine processed data existing as triples into a natural language, so that there is seamless transition between machine representation of information and information meant for human consumption. This paper presents a framework to lexicalize RDF triples extracted from DBpedia, a central interlinking hub for the emerging Web of Data. The framework comprises of four pattern mining modules which generate lexicalization patterns to transform triples to natural language sentences. Among these modules, three are based on lexicons and the other works on extracting relations by exploiting unstructured text to generate lexicalization patterns. A linguistic accuracy evaluation and a human evaluation on a sub-sample showed that the framework can produce patterns which are accurate and emanate human generated qualities.


  • Auer, S, C Bizer, G Kobilarov, and J Lehmann. Dbpedia: A nucleus for a web of open data. In 6th international The semantic web and 2nd Asian conference on Asian semantic web conference, pages 722-735, Busan, Korea, 2007. Springer-Verlag. URL http://link.springer.com/chapter/10.1007/978-3-540-76298-0{\_}52.Google Scholar

  • Bizer, C, J Lehmann, and G Kobilarov. DBpedia-A crystallization point for the Web of Data. Web Semantics: science …, 2009. URL http://www.sciencedirect.com/science/article/pii/S1570826809000225.Google Scholar

  • Busemann, Stephan. Ten Years After : An Update on TG/2 (and Friends). Proceedings 10th European Workshop on Natural Language Generation, 2, 2005.Google Scholar

  • de Marneffe, Marie-Catherine, Timothy Dozat, Natalia Silveira, Katri Haverinen, Filip Ginter, Joakim Nivre, and Christopher D Manning. Universal Stanford Dependencies: A crosslinguistic typology. In 9th International Conference on Language Resources and Evaluation (LREC’14), pages 4585-4592, 2014. ISBN 978-2-9517408-8-4. URL papers3://publication/ uuid/D4B7BB39-4FFB-4AA6-B21E-701A91F27739.Google Scholar

  • Del Corro, Luciano and Rainer Gemulla. ClausIE: clause-based open information extraction. pages 355-366, may 2013. URL http://dl.acm.org/citation.cfm?id=2488388.2488420.Google Scholar

  • Duma, Daniel and Ewan Klein. Generating Natural Language from Linked Data: Unsupervised template extraction. In 10th International Conference on Computational Semantics (IWCS 2013), Potsdam, 2013. Association for Computational Linguistics.Google Scholar

  • Ell, Basil and Andreas Harth. A language-independent method for the extraction of RDF verbalization templates. In 8th International Natural Language Generation Conference, Philadelphia, 2014. Association for Computational Linguistics.Google Scholar

  • Etzioni, Oren, Michele Banko, Stephen Soderland, and Daniel S. Weld. Open information extraction from the web. Communications of the ACM, 51(12):68-74, dec 2008. ISSN 00010782. doi:CrossrefGoogle Scholar

  • Fader, Anthony, Stephen Soderland, and Oren Etzioni. Identifying relations for open information extraction. In Empirical methods in Natural Language Processing, pages 1535-1545, 2011. ISBN 978-1-937284-11-4. doi:CrossrefGoogle Scholar

  • Kipper, Karin, Anna Korhonen, Neville Ryant, and Martha Palmer. A large-scale classification of English verbs. Language Resources and Evaluation, 42(1):21-40, 2008. ISSN 1574020X. doi:CrossrefGoogle Scholar

  • Kohlschütter, Christian, Peter Fankhauser, and Wolfgang Nejdl. Boilerplate Detection using Shallow Text Features. In ACM International Conference on Web Search and Data Mining, pages 441-450, 2010. ISBN 9781605588896. doi:CrossrefGoogle Scholar

  • Kövecses, Zoltán and Günter Radden. Metonymy: Developing a cognitive linguistic view. Cognitive Linguistics (includes Cognitive Linguistic Bibliography), 9(1):37-78, 1998.Google Scholar

  • Lassila, Ora, Ralph R Swick, et al. Resource Description Framework (RDF) model and syntax specification. 1998.Google Scholar

  • Lehmann, Jens, Robert Isele, Max Jakob, Anja Jentzsch, Dimitris Kontokostas, Pablo N. Mendes, Sebastian Hellmann, Mohamed Morsey, Patrick van Kleef, Soren Auer, and Christian Bizer. DBpedia - A Large-scale , Multilingual Knowledge Base Extracted from Wikipedia. Semantic Web journal, 5(1):1-29, 2014.Google Scholar

  • Manning, Christopher, John Bauer, Jenny Finkel, Steven J Bethard, and David McClosky. The Stanford CoreNLP Natural Language Processing Toolkit. In The 52nd Annual Meeting of the Association for Computational Linguistics, Baltimore, 2014. Association for Computational Linguistics.Google Scholar

  • Mausam, Michael Schmitz, Robert Bart, Stephen Soderland, and Oren Etzioni. Open language learning for information extraction. In Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pages 523-534, Jeju Island, jul 2012. Association for Computational Linguistics. URL http://dl.acm.org/citation.cfm?id=2390948.2391009.Google Scholar

  • Mendes, Pablo N., Max Jakob, and Christian Bizer. DBpedia for NLP: A Multilingual Crossdomain Knowledge Base. In International Conference on Language Resources and Evaluation, Istanbul, Turkey, 2012.Google Scholar

  • Moens, Marie Francine. Information extraction: Algorithms and prospects in a retrieval context, volume 21. 2006. ISBN 1402049870. doi:CrossrefGoogle Scholar

  • Reiter, Ehud and Anja Belz. An Investigation into the Validity of Some Metrics for Automatically Evaluating Natural Language Generation Systems. Computational Linguistics, 35(4):529-558, dec 2009. ISSN 0891-2017. doi:CrossrefGoogle Scholar

  • Reiter, Ehud and Robert Dale. Building Natural Language Generation Systems. Cambridge University Press, Cambridge, United Kingdom, jan 2000. ISBN 9780511551154. URL http://www.cambridge.org/us/academic/subjects/languages-linguistics/computational-linguistics/building-natural-language-generation-systems.Google Scholar

  • Schäfer, Florian. Naturally atomic er-nominalizations. Recherches linguistiques de Vincennes, 40 (1):27-42, 2011. ISSN 0986-6124. doi:CrossrefGoogle Scholar

  • Stribling, Jeremy, Max Krohn, and Dan Aguayo. SciGen, 2005. URL https://pdos.csail.mit.edu/archive/scigen/.Google Scholar

  • Unger, Christina. Question Answering over Linked Data: QALD-1 Open Challenge. Technical report, Bielefeld University, Bielefeld, 2011.Google Scholar

  • Walter, Sebastian, Christina Unger, and Philipp Cimiano. A Corpus-Based Approach for the Induction of Ontology Lexica. In 18th International Conference on Applications of Natural Language to Information Systems, pages 102-113, Salford, 2013. Springer-Verlag.Google Scholar

About the article

Published Online: 2016-10-15

Published in Print: 2016-10-01

Citation Information: The Prague Bulletin of Mathematical Linguistics, Volume 106, Issue 1, Pages 45–68, ISSN (Online) 1804-0462, DOI: https://doi.org/10.1515/pralin-2016-0011.

Export Citation

© by Rivindu Perera. This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. BY-NC-ND 4.0

Comments (0)

Please log in or register to comment.
Log in