Abstract
The paper focuses on the issue of creating equivalence links in the domain of bilingual computational lexicography. The existing interlingual links between plWordNet and Princeton WordNet synsets (sets of synonymous lexical units – lemma and sense pairs) are re-analysed from the perspective of equivalence types as defined in traditional lexicography and translation. Special attention is paid to cognitive and translational equivalents. A proposal of mapping lexical units is presented. Three types of links are defined: super-strong equivalence, strong equivalence and weak implied equivalence. The strong equivalences have a common set of formal, semantic and usage features, with some of their values slightly loosened for strong equivalence. These will be introduced manually by trained lexicographers. The sense-mapping will partly draw on the results of the existing synset mapping. The lexicographers will analyse lists of pairs of synsets linked by interlingual relations such as synonymy, partial synonymy, hyponymy and hypernymy. They will also consult bilingual dictionaries and check translation probabilities in a parallel corpus. The results of the proposed mapping have great application potential in the area of natural language processing, translation and language learning.
About the authors
Ewa Rudnicka is a Research Associate at the Department of Computer Science and Management, Wroclaw University of Technology, Poland. Her research interests include computational bilingual lexicography, comparative linguistics, formal semantics, translation studies. She is the coordinator of the process of mapping plWordNet onto Princeton WordNet. She is a member of G4.19. Language Technology and Computational Linguistic Research Group.
Francis Bond is an Associate Professor at the Division of Linguistics and Multilingual Studies, Nanyang Technological University, Singapore. He worked on machine translation and natural language understanding in Japan, first at Nippon Telegraph and Telephone Corporation and then at the National Institute of Information and Communications Technology, where his focus was on open source natural language processing. He is an active member of the Deep Linguistic Processing with HPSG Initiative (DELPHIN) and the Global WordNet Association. His main research interest is in natural language understanding. Francis has developed and released wordnets for Chinese, Japanese, Malay and Indonesian and coordinates the Open Multilingual Wordnet.
Łukasz Grabowski is an Associate Professor at the Institute of English, University of Opole, Poland. His research interests include corpus linguistics, phraseology, formulaic language, translation studies and lexicography. He is also interested in computer-assisted methods of text analysis. He has published internationally in International Journal of Corpus Linguistics and English for Specific Purposes, among others. He is also Managing Editor of the journal Explorations: A Journal of Language and Literature.
Maciej Piasecki is an Associate Professor at the Department of Computer Science and Management, Wroclaw University of Technology, Poland. He is a Polish National Coordinator of CLARIN ERIC (www.clarin.eu) and a member of Global WordNet Association Board. He has been an initiator and is the leader of plWordNet project (a large wordnet of Polish) and is the leader of G 4.19. Language Technology and Computational Linguistic Research Group. His research interests cover different areas of natural language processing and engineering, computational lexicography, data extraction and information retrieval.
Tadeusz Piotrowski is a Professor at the English Department, University of Wrocław, Poland. His research interests include theory, practice, and history of monolingual and bilingual lexicography and dictionaries, corpus linguistics, translation studies, participated in most major bilingual dictionary projects in Poland, working with such companies as PWN, OUP, Pons-Klett, Langenscheidt, Prószyński, Wiedza Powszechna, Kościuszko Foundation, and wrote a number of dictionaries for Spotkania. He is also interested in computational lexicography and computer-assisted text analysis. He published three books and about 200 papers.
Acknowledgements
The paper is the result of works carried out within the project funded by the National Science Centre (Narodowe Centrum Nauki), Poland, under the grant agreement no: UMO-2015/18/M/HS2/00100.
References
Adamska-Sałaciak, Arleta. 2010. Examining equivalence. International Journal of Lexicography 23(4). 387–409.10.1093/ijl/ecq024Search in Google Scholar
Adamska-Sałaciak, Arleta. 2013. Issues in compiling bilingual dictionaries. In Howard Jackson (ed.), The Bloomsbury companion to lexicography, 213–231. London: Bloomsbury.Search in Google Scholar
Adamska-Sałaciak, Arleta. 2014. Bilingual lexicography: translation dictionaries. In Patrick Hanks & Gilles-Maurice de Schryver (eds.), International handbook of modern lexis and lexicography, 1–11. Springer-Verlag: Berlin-Heidelberg.10.1007/978-3-642-45369-4_6-1Search in Google Scholar
Bentivogli, Luisa & Emanuele Pianta. 2004. Extending WordNet with Syntagmatic Information. In Proceedings of the Second Global WordNet Conference, Brno, Czech Republic, January 20–23, 2004, 47–53.Search in Google Scholar
Crenn, Tiphaine. 1996. Register and register labelling in dictionaries. Ottawa: University of Ottawa.Search in Google Scholar
Fellbaum, Christiane (ed.). 1998. WordNet: An Electronic Lexical Database. Cambridge: MIT Press.10.7551/mitpress/7287.001.0001Search in Google Scholar
von Fintel, Kai & Lisa Matthewson. 2008. Universals in Semantics. The Linguistic Review 25(1-2). 139–201.10.1515/TLIR.2008.004Search in Google Scholar
Fišer, Darja & Benoit Sagot. 2015. Constructing a poor man’s wordnet in a resource-rich world. Language Resources & Evaluation 49(3). 601–635.10.1007/s10579-015-9295-6Search in Google Scholar
Hamp, Birgit & Helmut Feldweg. 1997. GermaNet – a Lexical Semantic Net for German. In Piek Vossen, Geert Adriaens, Nicoletta Calzolari, Antonio Sanfilippo & Yorick Wilks (eds.), Proceedings of ACL workshop Automatic Information Extraction and Building of Lexical Semantic Resources for NLP Applications, 9–15. Madrid: ACL.Search in Google Scholar
Héja, Enikő. 2016. Revisiting translational equivalence: contributions from datadriven bilingual lexicography. International Journal of Lexicography, ecw032.10.1093/ijl/ecw032Search in Google Scholar
Kamiński, Mariusz. 2016. Towards successful communication between the dictionary and the user. In Anna Kuzio, Jolanta Kowal & Miroslawa Wawrzak-Chodaczek (eds.), Social communication in the real and virtual world. Vol. 1., 73–91. Saarbrücken: LAP LAMBERT Academic Publishing.Search in Google Scholar
Lardilleux, Adrien & Yves Lepage. 2009. Sampling-based multilingual alignment. International Conference on Recent Advances in Natural Language Processing (RANLP 2009), Borovets, Bulgaria, 214–218. Retrieved from: https://hal.archives-ouvertes.fr/hal-00439789/document.Search in Google Scholar
Lew, Robert. 2013. Identifying, ordering and defining senses. In Howard Jackson (ed.), The Bloomsbury companion to lexicography, 284–302. London: Bloomsbury.Search in Google Scholar
Lindén, Krister & Lauri Carlson. 2010. FinnWordNet – WordNet påfinska via översättning, LexicoNordica – Nordic Journal of Lexicography, 17. 119–140. [English translation ‘FinnWordNet – Finnish Word-Net by translation’]. Retrieved from: http://www.ling.helsinki.fi/~klinden/pubs/FinnWordnetInLexicoNordica-en.pdf.Search in Google Scholar
Lui, Marco & Timothy Baldwin. 2011. Cross-domain Feature Selection for Language Identification, In Proceedings of the Fifth International Joint Conference on Natural Language Processing (IJCNLP 2011), Chiang Mai, Thailand. 553–561. Retrieved from:http://www.aclweb.org/anthology/I11-1062.Search in Google Scholar
Maziarz, Marek, Maciej Piasecki & Stanisław Szpakowicz. 2013a. The chicken-and-egg problem in wordnet design: synonymy, synsets and constitutive relations. Language Resources and Evaluation 47(3). 769–796.10.1007/s10579-012-9209-9Search in Google Scholar
Maziarz, Marek, Maciej Piasecki & Stanisław Szpakowicz. 2015. The System of Register Labels in plWordNet. Cognitive Studies 15. 161–175.10.11649/cs.2015.013Search in Google Scholar
Pęzik, Piotr. 2016. Exploring phraseological equivalence with Paralela. In Ewa Gruszczyńska & Agnieszka Leńko-Szymańska (eds.), Polish-Language Parallel Corpora, 67–81. Warszawa: Instytut Lingwistyki Stosowanej UW.Search in Google Scholar
Piasecki, Maciej, Stanisław Szpakowicz & Bartosz Broda 2009. A wordnet from the ground up. Wrocław: Oficyna Wydawnicza Politechniki Wrocławskiej.Search in Google Scholar
Piasecki, Maciej, Marek Maziarz, Ewa Rudnicka, Agnieszka Dziob & Paweł Kędzia. 2017, in print. plWordnet – a Large Corpus-Based Wordnet of Polish. Linguistic Issues in Language Technology.Search in Google Scholar
Piotrowski, Tadeusz. 2011a. Ekwiwalencja w słownikach dwujęzycznych. In Wojciech Chlebda (ed.), Na tropach translatöw: w poszukiwaniu odpowiedniköw przekładowych, 45–70. Opole: Wydawnictwo Uniwersytetu Opolskiego.Search in Google Scholar
Piotrowski, Tadeusz. 2011b. Tertium comparationis w przekładoznawstwie. In Piotr Stalmaszczyk (ed.), Metodologie językoznawstwa. Od ontologii do pragmatyki, 175–192. Łόdź: Wydawnictwo Uniwersytetu Łόdzkiego.10.18778/7525-585-0.11Search in Google Scholar
Rudnicka, Ewa, Marek Maziarz, Maciej Piasecki & Stanisław Szpakowicz 2012. A strategy of mapping Polish WordNet onto Princeton WordNet. In Proceedings of COLING 2012. Retrieved from: www.aclweb.org/anthology/C12-2101.Search in Google Scholar
Rudnicka, Ewa, Wojciech Witkowski & Michał Kaliński. 2015. a semi-automatic adjective mapping between plWordNet and Princeton WordNet. In: Pavel Kral & Vaclav Matousek (eds.), Text, speech, dialogue, 360–368. Berlin: Springer.10.1007/978-3-319-24033-6_41Search in Google Scholar
Rudnicka, Ewa, Wojciech Witkowski & Łukasz Grabowski. 2016. Towards a methodology for filtering out gaps and mismatches across wordnets: the case of noun synsets in plWordNet and Princeton WordNet. In Verginica Barbu Mititelu, Corina Forascu, Christiane Fellbaum & Piek Vossen (eds.), Proceedings of the Eighth International Global WordNet Conference 2016, 27–30 Jan 2016, Bucharest, Romania, 344–351. Retrieved from: http://gwc2016.racai.ro/procedings.pdfSearch in Google Scholar
Rudnicka, Ewa, Maciej Piasecki, Tadeusz Piotrowski, Łukasz Grabowski & Francis Bond. 2017, in print. Mapping wordnets from the perspective of interlingual equivalence. Cognitive Studies 17.10.11649/cs.1373Search in Google Scholar
Rudnicka, Ewa, Maciej Piasecki & Wojciech Witkowski. 2017, in print. enWordnet – a mapping-based extension of Princeton WordNet. Linguistic Issues in Language Technology.Search in Google Scholar
Svensen, Bo. 2009. A Handbook of lexicography. The theory and practice of dictionary-making. Cambridge: Cambridge University Press.Search in Google Scholar
Taylor, John. 2012. The mental corpus. How language is represented in the mind. Oxford: Oxford University Press.10.1093/acprof:oso/9780199290802.001.0001Search in Google Scholar
Vossen, Piek (ed.). 2002. EuroWordNet general documentation, Version 3. Retrieved from: http://www.vossen.info/docs/2002/EWNGeneral.pdfSearch in Google Scholar
Yong, Heming & Jing Peng. 2007. Bilingual lexicography from a communicative perspective. Amsterdam: John Benjamins.10.1075/tlrp.9Search in Google Scholar
© 2017 Walter de Gruyter GmbH, Berlin/Boston