Skip to content
Licensed Unlicensed Requires Authentication Published by De Gruyter Mouton September 2, 2017

Towards equivalence links between senses in plWordNet and Princeton WordNet

  • Ewa Rudnicka

    Ewa Rudnicka is a Research Associate at the Department of Computer Science and Management, Wroclaw University of Technology, Poland. Her research interests include computational bilingual lexicography, comparative linguistics, formal semantics, translation studies. She is the coordinator of the process of mapping plWordNet onto Princeton WordNet. She is a member of G4.19. Language Technology and Computational Linguistic Research Group.

    EMAIL logo
    , Francis Bond

    Francis Bond is an Associate Professor at the Division of Linguistics and Multilingual Studies, Nanyang Technological University, Singapore. He worked on machine translation and natural language understanding in Japan, first at Nippon Telegraph and Telephone Corporation and then at the National Institute of Information and Communications Technology, where his focus was on open source natural language processing. He is an active member of the Deep Linguistic Processing with HPSG Initiative (DELPHIN) and the Global WordNet Association. His main research interest is in natural language understanding. Francis has developed and released wordnets for Chinese, Japanese, Malay and Indonesian and coordinates the Open Multilingual Wordnet.

    , Łukasz Grabowski

    Łukasz Grabowski is an Associate Professor at the Institute of English, University of Opole, Poland. His research interests include corpus linguistics, phraseology, formulaic language, translation studies and lexicography. He is also interested in computer-assisted methods of text analysis. He has published internationally in International Journal of Corpus Linguistics and English for Specific Purposes, among others. He is also Managing Editor of the journal Explorations: A Journal of Language and Literature.

    , Maciej Piasecki

    Maciej Piasecki is an Associate Professor at the Department of Computer Science and Management, Wroclaw University of Technology, Poland. He is a Polish National Coordinator of CLARIN ERIC (www.clarin.eu) and a member of Global WordNet Association Board. He has been an initiator and is the leader of plWordNet project (a large wordnet of Polish) and is the leader of G 4.19. Language Technology and Computational Linguistic Research Group. His research interests cover different areas of natural language processing and engineering, computational lexicography, data extraction and information retrieval.

    and Tadeusz Piotrowski

    Tadeusz Piotrowski is a Professor at the English Department, University of Wrocław, Poland. His research interests include theory, practice, and history of monolingual and bilingual lexicography and dictionaries, corpus linguistics, translation studies, participated in most major bilingual dictionary projects in Poland, working with such companies as PWN, OUP, Pons-Klett, Langenscheidt, Prószyński, Wiedza Powszechna, Kościuszko Foundation, and wrote a number of dictionaries for Spotkania. He is also interested in computational lexicography and computer-assisted text analysis. He published three books and about 200 papers.

From the journal Lodz Papers in Pragmatics

Abstract

The paper focuses on the issue of creating equivalence links in the domain of bilingual computational lexicography. The existing interlingual links between plWordNet and Princeton WordNet synsets (sets of synonymous lexical units – lemma and sense pairs) are re-analysed from the perspective of equivalence types as defined in traditional lexicography and translation. Special attention is paid to cognitive and translational equivalents. A proposal of mapping lexical units is presented. Three types of links are defined: super-strong equivalence, strong equivalence and weak implied equivalence. The strong equivalences have a common set of formal, semantic and usage features, with some of their values slightly loosened for strong equivalence. These will be introduced manually by trained lexicographers. The sense-mapping will partly draw on the results of the existing synset mapping. The lexicographers will analyse lists of pairs of synsets linked by interlingual relations such as synonymy, partial synonymy, hyponymy and hypernymy. They will also consult bilingual dictionaries and check translation probabilities in a parallel corpus. The results of the proposed mapping have great application potential in the area of natural language processing, translation and language learning.


Ewa Rudnicka Department of Computer Science and Management Wybrzeże Wyspiańskiego 27 50-370 Wrocław, Poland

About the authors

Ewa Rudnicka

Ewa Rudnicka is a Research Associate at the Department of Computer Science and Management, Wroclaw University of Technology, Poland. Her research interests include computational bilingual lexicography, comparative linguistics, formal semantics, translation studies. She is the coordinator of the process of mapping plWordNet onto Princeton WordNet. She is a member of G4.19. Language Technology and Computational Linguistic Research Group.

Francis Bond

Francis Bond is an Associate Professor at the Division of Linguistics and Multilingual Studies, Nanyang Technological University, Singapore. He worked on machine translation and natural language understanding in Japan, first at Nippon Telegraph and Telephone Corporation and then at the National Institute of Information and Communications Technology, where his focus was on open source natural language processing. He is an active member of the Deep Linguistic Processing with HPSG Initiative (DELPHIN) and the Global WordNet Association. His main research interest is in natural language understanding. Francis has developed and released wordnets for Chinese, Japanese, Malay and Indonesian and coordinates the Open Multilingual Wordnet.

Łukasz Grabowski

Łukasz Grabowski is an Associate Professor at the Institute of English, University of Opole, Poland. His research interests include corpus linguistics, phraseology, formulaic language, translation studies and lexicography. He is also interested in computer-assisted methods of text analysis. He has published internationally in International Journal of Corpus Linguistics and English for Specific Purposes, among others. He is also Managing Editor of the journal Explorations: A Journal of Language and Literature.

Maciej Piasecki

Maciej Piasecki is an Associate Professor at the Department of Computer Science and Management, Wroclaw University of Technology, Poland. He is a Polish National Coordinator of CLARIN ERIC (www.clarin.eu) and a member of Global WordNet Association Board. He has been an initiator and is the leader of plWordNet project (a large wordnet of Polish) and is the leader of G 4.19. Language Technology and Computational Linguistic Research Group. His research interests cover different areas of natural language processing and engineering, computational lexicography, data extraction and information retrieval.

Tadeusz Piotrowski

Tadeusz Piotrowski is a Professor at the English Department, University of Wrocław, Poland. His research interests include theory, practice, and history of monolingual and bilingual lexicography and dictionaries, corpus linguistics, translation studies, participated in most major bilingual dictionary projects in Poland, working with such companies as PWN, OUP, Pons-Klett, Langenscheidt, Prószyński, Wiedza Powszechna, Kościuszko Foundation, and wrote a number of dictionaries for Spotkania. He is also interested in computational lexicography and computer-assisted text analysis. He published three books and about 200 papers.

Acknowledgements

The paper is the result of works carried out within the project funded by the National Science Centre (Narodowe Centrum Nauki), Poland, under the grant agreement no: UMO-2015/18/M/HS2/00100.

References

Adamska-Sałaciak, Arleta. 2010. Examining equivalence. International Journal of Lexicography 23(4). 387–409.10.1093/ijl/ecq024Search in Google Scholar

Adamska-Sałaciak, Arleta. 2013. Issues in compiling bilingual dictionaries. In Howard Jackson (ed.), The Bloomsbury companion to lexicography, 213–231. London: Bloomsbury.Search in Google Scholar

Adamska-Sałaciak, Arleta. 2014. Bilingual lexicography: translation dictionaries. In Patrick Hanks & Gilles-Maurice de Schryver (eds.), International handbook of modern lexis and lexicography, 1–11. Springer-Verlag: Berlin-Heidelberg.10.1007/978-3-642-45369-4_6-1Search in Google Scholar

Bentivogli, Luisa & Emanuele Pianta. 2004. Extending WordNet with Syntagmatic Information. In Proceedings of the Second Global WordNet Conference, Brno, Czech Republic, January 20–23, 2004, 47–53.Search in Google Scholar

Crenn, Tiphaine. 1996. Register and register labelling in dictionaries. Ottawa: University of Ottawa.Search in Google Scholar

Fellbaum, Christiane (ed.). 1998. WordNet: An Electronic Lexical Database. Cambridge: MIT Press.10.7551/mitpress/7287.001.0001Search in Google Scholar

von Fintel, Kai & Lisa Matthewson. 2008. Universals in Semantics. The Linguistic Review 25(1-2). 139–201.10.1515/TLIR.2008.004Search in Google Scholar

Fišer, Darja & Benoit Sagot. 2015. Constructing a poor man’s wordnet in a resource-rich world. Language Resources & Evaluation 49(3). 601–635.10.1007/s10579-015-9295-6Search in Google Scholar

Hamp, Birgit & Helmut Feldweg. 1997. GermaNeta Lexical Semantic Net for German. In Piek Vossen, Geert Adriaens, Nicoletta Calzolari, Antonio Sanfilippo & Yorick Wilks (eds.), Proceedings of ACL workshop Automatic Information Extraction and Building of Lexical Semantic Resources for NLP Applications, 9–15. Madrid: ACL.Search in Google Scholar

Héja, Enikő. 2016. Revisiting translational equivalence: contributions from datadriven bilingual lexicography. International Journal of Lexicography, ecw032.10.1093/ijl/ecw032Search in Google Scholar

Kamiński, Mariusz. 2016. Towards successful communication between the dictionary and the user. In Anna Kuzio, Jolanta Kowal & Miroslawa Wawrzak-Chodaczek (eds.), Social communication in the real and virtual world. Vol. 1., 73–91. Saarbrücken: LAP LAMBERT Academic Publishing.Search in Google Scholar

Lardilleux, Adrien & Yves Lepage. 2009. Sampling-based multilingual alignment. International Conference on Recent Advances in Natural Language Processing (RANLP 2009), Borovets, Bulgaria, 214–218. Retrieved from: https://hal.archives-ouvertes.fr/hal-00439789/document.Search in Google Scholar

Lew, Robert. 2013. Identifying, ordering and defining senses. In Howard Jackson (ed.), The Bloomsbury companion to lexicography, 284–302. London: Bloomsbury.Search in Google Scholar

Lindén, Krister & Lauri Carlson. 2010. FinnWordNet – WordNet påfinska via översättning, LexicoNordicaNordic Journal of Lexicography, 17. 119–140. [English translation ‘FinnWordNet – Finnish Word-Net by translation’]. Retrieved from: http://www.ling.helsinki.fi/~klinden/pubs/FinnWordnetInLexicoNordica-en.pdf.Search in Google Scholar

Lui, Marco & Timothy Baldwin. 2011. Cross-domain Feature Selection for Language Identification, In Proceedings of the Fifth International Joint Conference on Natural Language Processing (IJCNLP 2011), Chiang Mai, Thailand. 553–561. Retrieved from:http://www.aclweb.org/anthology/I11-1062.Search in Google Scholar

Maziarz, Marek, Maciej Piasecki & Stanisław Szpakowicz. 2013a. The chicken-and-egg problem in wordnet design: synonymy, synsets and constitutive relations. Language Resources and Evaluation 47(3). 769–796.10.1007/s10579-012-9209-9Search in Google Scholar

Maziarz, Marek, Maciej Piasecki & Stanisław Szpakowicz. 2015. The System of Register Labels in plWordNet. Cognitive Studies 15. 161–175.10.11649/cs.2015.013Search in Google Scholar

Pęzik, Piotr. 2016. Exploring phraseological equivalence with Paralela. In Ewa Gruszczyńska & Agnieszka Leńko-Szymańska (eds.), Polish-Language Parallel Corpora, 67–81. Warszawa: Instytut Lingwistyki Stosowanej UW.Search in Google Scholar

Piasecki, Maciej, Stanisław Szpakowicz & Bartosz Broda 2009. A wordnet from the ground up. Wrocław: Oficyna Wydawnicza Politechniki Wrocławskiej.Search in Google Scholar

Piasecki, Maciej, Marek Maziarz, Ewa Rudnicka, Agnieszka Dziob & Paweł Kędzia. 2017, in print. plWordnet – a Large Corpus-Based Wordnet of Polish. Linguistic Issues in Language Technology.Search in Google Scholar

Piotrowski, Tadeusz. 2011a. Ekwiwalencja w słownikach dwujęzycznych. In Wojciech Chlebda (ed.), Na tropach translatöw: w poszukiwaniu odpowiedniköw przekładowych, 45–70. Opole: Wydawnictwo Uniwersytetu Opolskiego.Search in Google Scholar

Piotrowski, Tadeusz. 2011b. Tertium comparationis w przekładoznawstwie. In Piotr Stalmaszczyk (ed.), Metodologie językoznawstwa. Od ontologii do pragmatyki, 175–192. Łόdź: Wydawnictwo Uniwersytetu Łόdzkiego.10.18778/7525-585-0.11Search in Google Scholar

Rudnicka, Ewa, Marek Maziarz, Maciej Piasecki & Stanisław Szpakowicz 2012. A strategy of mapping Polish WordNet onto Princeton WordNet. In Proceedings of COLING 2012. Retrieved from: www.aclweb.org/anthology/C12-2101.Search in Google Scholar

Rudnicka, Ewa, Wojciech Witkowski & Michał Kaliński. 2015. a semi-automatic adjective mapping between plWordNet and Princeton WordNet. In: Pavel Kral & Vaclav Matousek (eds.), Text, speech, dialogue, 360–368. Berlin: Springer.10.1007/978-3-319-24033-6_41Search in Google Scholar

Rudnicka, Ewa, Wojciech Witkowski & Łukasz Grabowski. 2016. Towards a methodology for filtering out gaps and mismatches across wordnets: the case of noun synsets in plWordNet and Princeton WordNet. In Verginica Barbu Mititelu, Corina Forascu, Christiane Fellbaum & Piek Vossen (eds.), Proceedings of the Eighth International Global WordNet Conference 2016, 27–30 Jan 2016, Bucharest, Romania, 344–351. Retrieved from: http://gwc2016.racai.ro/procedings.pdfSearch in Google Scholar

Rudnicka, Ewa, Maciej Piasecki, Tadeusz Piotrowski, Łukasz Grabowski & Francis Bond. 2017, in print. Mapping wordnets from the perspective of interlingual equivalence. Cognitive Studies 17.10.11649/cs.1373Search in Google Scholar

Rudnicka, Ewa, Maciej Piasecki & Wojciech Witkowski. 2017, in print. enWordnet – a mapping-based extension of Princeton WordNet. Linguistic Issues in Language Technology.Search in Google Scholar

Svensen, Bo. 2009. A Handbook of lexicography. The theory and practice of dictionary-making. Cambridge: Cambridge University Press.Search in Google Scholar

Taylor, John. 2012. The mental corpus. How language is represented in the mind. Oxford: Oxford University Press.10.1093/acprof:oso/9780199290802.001.0001Search in Google Scholar

Vossen, Piek (ed.). 2002. EuroWordNet general documentation, Version 3. Retrieved from: http://www.vossen.info/docs/2002/EWNGeneral.pdfSearch in Google Scholar

Yong, Heming & Jing Peng. 2007. Bilingual lexicography from a communicative perspective. Amsterdam: John Benjamins.10.1075/tlrp.9Search in Google Scholar

Published Online: 2017-9-2
Published in Print: 2017-8-28

© 2017 Walter de Gruyter GmbH, Berlin/Boston

Downloaded on 28.3.2024 from https://www.degruyter.com/document/doi/10.1515/lpp-2017-0002/html
Scroll to top button