Skip to content
Licensed Unlicensed Requires Authentication Published by De Gruyter Mouton August 17, 2019

A Weakly supervised word sense disambiguation for Polish using rich lexical resources

  • Arkadiusz Janz and Maciej Piasecki EMAIL logo

Abstract

Automatic word sense disambiguation (WSD) has proven to be an important technique in many natural language processing tasks. For many years the problem of sense disambiguation has been approached with a wide range of methods, however, it is still a challenging problem, especially in the unsupervised setting. One of the well-known and successful approaches to WSD are knowledge-based methods leveraging lexical knowledge resources such as wordnets. As the knowledge-based approaches mostly do not use any labelled training data their performance strongly relies on the structure and the quality of used knowledge sources. However, a pure knowledge-base such as a wordnet cannot reflect all the semantic knowledge necessary to correctly disambiguate word senses in text. In this paper we explore various expansions to plWordNet as knowledge-bases for WSD. Semantic links extracted from a large valency lexicon (Walenty), glosses and usage examples, Wikipedia articles and SUMO ontology are combined with plWordNet and tested in a PageRank-based WSD algorithm. In addition, we analyse also the influence of lexical semantics vector models extracted with the help of the distributional semantics methods. Several new Polish test data sets for WSD are also introduced. All the resources, methods and tools are available on open licences.

9

9 Acknowledgment

This research was partially funded by the Polish Ministry of Science and Higher Education within CLARIN-PL Research Infrastructure.

References

Agirre, E., O. Lopez de Lacalle and A. Soroa. 2014. “Random walks for knowledge-based word sense disambiguation”. Computational Linguistics 40(1). 57–84.10.1162/COLI_a_00164Search in Google Scholar

Agirre, E. and Aitor Soroa. 2009. “Personalizing Pagerank for word sense disambiguation”. Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics 33–41. EACL ’09. Stroudsburg, PA, USA: Association for Computational Linguistics. <http://dl.acm.org/citation.cfm?id=1609067.1609070>10.3115/1609067.1609070Search in Google Scholar

Baś, D., B. Broda and M. Piasecki. 2008. “Towards word sense disambiguation of Polish”. <http://www.proceedings2008.imcsit.org/pliks/162.pdf>10.1109/IMCSIT.2008.4747220Search in Google Scholar

Bojanowski, P., E. Grave, A. Joulin and T. Mikolov. 2017. “Enriching word vectors with subword information”. Transactions of the Association for Computational Linguistics 5. 135–146.10.1162/tacl_a_00051Search in Google Scholar

Brin, S. and L. Page. 1998. “The anatomy of a large-scale hypertextual web search engine”. Computer Networks and ISDN Systems 30(1–7). 107–117.10.1016/S0169-7552(98)00110-XSearch in Google Scholar

Broda, B., M. Piasecki and S.anisław Szpakowicz. 2010. “Extraction of Polish noun senses from large corpora by means of clustering”. Control and Cybernatics 39. 401–420.Search in Google Scholar

Brown, P.F., S.A. Della Pietra, V.J. Della Pietra and R.L. Mercer. 1991. “Word-sense disambiguation using statistical methods”. Association for Computational Linguistics 264–270. ACL.10.3115/981344.981378Search in Google Scholar

Fellbaum, C. (ed.). 1998. WordNet: An electronic lexical database. (Language, Speech and Communication) Cambridge, MA: The MIT Press.10.7551/mitpress/7287.001.0001Search in Google Scholar

Gale, W.A., K.W. Church and D. Yarowsky. 1992. “A method for disambiguating word senses in a large corpus”. Computers and the Humanities 26(5–6). 415–439.10.1007/BF00136984Search in Google Scholar

Hajnicz, E. 2014. “Lexico-semantic annotation of Składnica treebank by means of PLWN lexical units”. In: Orav, H., C. Fellbaum and P. Vossen (eds.), Proceedings of the 7th International Wordnet Conference (GWC 2014). Tartu, Estonia: University of Tartu. 23–31.Search in Google Scholar

Hajnicz, E., A. Andrzejczuk and T. Bartosiak. 2016. “Semantic layer of the valence dictionary of Polish Walenty”. In: Calzolari, N., K. Choukri, T. Declerck, M. Grobelnik, B. Maegaard, J. Mariani, A. Moreno, J. Odijk and S. Piperidis (eds.), Proceedings of the Tenth International Conference on Language Resources and Evaluation, LREC 2016 Portorož, Slovenia: ELRA; European Language Resources Association (ELRA). 2625–2632. <http://www.lrec-conf.org/proceedings/lrec2016/index.html>Search in Google Scholar

Janz, A., J. Kocoń, M. Piasecki and M. Zaśko-Zielińska. 2017. “plWordNet as a basis for large emotive lexicons of Polish”. In: Vetulani, Z. and P. Paroubek (eds.), Proceedings of Human Language Technologies as a Challenge for Computer Science and Linguistics Poznań: Fundacja Uniwersytetu im. Adama Mickiewicza w Poznaniu. 189–193.Search in Google Scholar

Kędzia, P. and M. Piasecki. 2014. “Rule-based, interlingual motivated mapping of plWordNet onto Sumo Ontology”. In: Calzolari, N., K. Choukri, T. Declerck, M. Grobelnik, B. Maegaard, J. Mariani, A. Moreno, J. Odijk and S. Piperidis (eds.), Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14) Reykjavik, Iceland: European Language Resources Association (ELRA).Search in Google Scholar

Kędzia, P., M. Piasecki and M. Orlińska. 2015. “Word sense disambiguation based on large scale Polish CLARIN heterogeneous lexical resources”. Cognitive Studies / Études Cognitives 15. 269–292. <https://doi.org/10.11649/cs.2015.019>10.11649/cs.2015.019Search in Google Scholar

Lee, Y.K. and H.T. Ng. 2002. “An empirical evaluation of knowledge sources and learning algorithms for word sense disambiguation”. Association for Computational Linguistics 41–48. ACL.10.3115/1118693.1118699Search in Google Scholar

Lesk, M. 1986. “Automatic sense disambiguation using machine readable dictionaries: How to tell a pine cone from an ice cream cone”. ACM Press.10.1145/318723.318728Search in Google Scholar

Li, J. and D. Jurafsky. 2015. “Do multi-sense embeddings improve natural language understanding?” Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. ACL. 1722–1732.10.18653/v1/D15-1200Search in Google Scholar

Marcińczuk, M., J. Kocoń and M. Oleksy. 2017. “Liner2 – A generic framework for named entity recognition”. Valencia, Spain: Association for Computational Linguistics. <http://www.aclweb.org/anthology/W17-1413>10.18653/v1/W17-1413Search in Google Scholar

Maziarz, M. and M. Piasecki. 2018. “Towards mapping thesauri onto plWordNet”. In: Bond, F., C. Fellbaum and P. Vossen (eds.), Proceedings of the Oth Global Wordnet Conference, Singapore, R–12 January 2018 Global WordNet Association.Search in Google Scholar

Maziarz, M., M. Piasecki, J. Rabiega-Wiśniewska and S. Szpakowicz. 2011. “Semantic relations between verbs in Polish Wordnet 2.0”. Cognitive Studies / Études Cognitives 11. 183–200. <https://ispan.waw.pl/journals/index.php/cs-ec/article/view/cs.2011.011>10.11649/cs.2011.011Search in Google Scholar

Maziarz, M., M. Piasecki, E. Rudnicka, S. Szpakowicz and P. Kędzia. 2016. “PlWord-Net 3.0 – A comprehensive lexical-semantic resource”. In: Calzolari, N., Y. Matsumoto and R. Prasad (eds.), COLING 2016, 26th International Conference on Computational Linguistics, Proceedings of the Conference: Technical Papers, December 11–16, 2016, Osaka, Japan ACL. 2259–2268. <http://aclweb.org/anthology/C/C16/>Search in Google Scholar

Mihalcea, R., P. Tarau and E. Figa. 2004. “PageRank on semantic networks, with application to word sense disambiguation”. Proceedings of the 20th International Conference on Computational Linguistics COLING ’04. Stroudsburg, PA: Association for Computational Linguistics. <https://doi.org/10.3115/1220355.1220517>10.3115/1220355.1220517Search in Google Scholar

Moro, A., A. Raganato and R. Navigli. 2014. “Entity linking meets word sense disambiguation: A unified approach”. Transactions of the Association for Computational Linguistics (TACL) S. 231–244.10.1162/tacl_a_00179Search in Google Scholar

Młodzki, R. and A. Przepiórkowski. 2009. “The WSD development environment”. In: Vetulani, Z. (ed.), Human language technology. Challenges for computer science and linguistics. (Lecture Notes in Computer Science) Berlin: Springer. 224–233. <http://dblp.uni-trier.de/db/conf/ltconf/ltconf2009.html#MlodzkiP09>10.1007/978-3-642-20095-3_21Search in Google Scholar

Navigli, R. and S. Paolo Ponzetto. 2012. “BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network”. Artificial Intelligence 193. 217–250.10.1016/j.artint.2012.07.001Search in Google Scholar

Ng, H.T. and H.B. Lee. 1996. “Integrating multiple knowledge sources to disambiguate word sense: An exemplar-based approach”. Association for Computational Linguistics 40–47. ACL.10.3115/981863.981869Search in Google Scholar

Oele, D. and G. van Noord. 2018. “Simple embedding-based word sense disambiguation”. In: Bond, F., C. Fellbaum and P. Vossen (eds.), Proceedings of the Oth Global Wordnet Conference, Singapore, R–12 January 2018 Global WordNet Association.Search in Google Scholar

Orav, H., C. Fellbaum and P. Vossen (eds.). 2014. Proceedings of the 7th International Wordnet Conference (GWC 2014) Tartu, Estonia: University of Tartu.Search in Google Scholar

Pantel, P.A. 2003. Clustering by committee. (PhD dissertation, University of Alberta, Edmonton.)Search in Google Scholar

Patwardhan, S., S. Banerjee and T. Pedersen. 2003. “Using measures of semantic relatedness for word sense disambiguation”. Computational Linguistics and Intelligent Text Processing 2588. 241–257. BerlinSpringer10.1007/3-540-36456-0_24Search in Google Scholar

Pease, A. 2011. Ontology: A practical guide Articulate Software Press.Search in Google Scholar

Peixoto, T. P. 2014. “The Graph-Tool Python Library”. Figshare<https://doi.org/10.6084/m9.figshare.1164194>Search in Google Scholar

Piasecki, M., G. Czachor, A. Janz, D. Kaszewski and P. Kędzia. 2018. “Wordnet-based evaluation of large distributional models for Polish”. In: Bond, F., C. Fellbaum and P. Vossen (eds.), Proceedings of the Oth Global Wordnet Conference, Singapore, 8– 12 January 2018 Global WordNet Association.Search in Google Scholar

Piasecki, M., K. Młynarczyk and J. Kocoń. 2017. “Recognition of genuine Polish suicide notes”. In: Mitkov, R. and G. Angelova (eds.), Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017, Varna, Bulgaria, September S–R, 2017. INCOMA Ltd. 583–591. <https://doi.org/10.26615/978-954-452-049-6_076>10.26615/978-954-452-049-6_076Search in Google Scholar

Piasecki, M., M. Wendelberger and M. Maziarz. 2015. “Extraction of the multi-word lexical units in the perspective of the wordnet expansion.”Search in Google Scholar

Przepiórkowski, A., M. Bańko, R.L. Górski and B. Lewandowska-Tomaszczyk (eds.). 2012. Narodowy Korpus Języka Polskiego Warsaw: Wydawnictwo Naukowe PWN.Search in Google Scholar

Przepiórkowski, A., E. Hajnicz, A. Patejuk, M. Woliński, F. Skwarski and M. Świdziński. 2014. “Walenty: Towards a comprehensive valence dictionary of Polish”. In: Calzolari, N., K. Choukri, T. Declerck, H. Loftsson, B. Maegaard, J. Mariani, A. Moreno, J. Odijk and S. Piperidis (eds.), Proceedings of the Ninth International Conference on Language Resources and Evaluation, LREC 2014. Reykjavı́k, Iceland: ELRA. 2785–2792. <http://www.lrec-conf.org/proceedings/lrec2014/index.html>Search in Google Scholar

Radziszewski, A., A. Wardyński and T. Śniatowski. 2011. “WCCL: A morpho-syntactic feature toolkit”. BerlinSpringer<http://nlp.pwr.wroc.pl/redmine/attachments/361/wccl.pdf>10.1007/978-3-642-23538-2_55Search in Google Scholar

Radziszewski, A. and R. Warzocha. 2014. “WCRFTS. CLARIN-Pl digital repository”. <http:// hdl.handle.net/11321/36>Search in Google Scholar

Raganato, A., J. Camacho-Collados and R. Navigli. 2017. “Word sense disambiguation: A unified evaluation framework and empirical comparison”. Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers U. 99–110.10.18653/v1/E17-1010Search in Google Scholar

Rothe, S. and H. Schütze. 2015. “AutoExtend: Extending word embeddings to embeddings for synsets and lexemes”. The Association for Computer Linguistics. ACL (1). 1793–1803.10.3115/v1/P15-1173Search in Google Scholar

Schapire, R.E. and Y. Singer. 1999. “Improved boosting algorithms using confidence-rated predictions”. Machine Learning 37(3). 297–336.10.1023/A:1007614523901Search in Google Scholar

Stevenson, M., E. Agirre and A. Soroa. 2012. “Exploiting domain information for word sense disambiguation of medical documents”. JAMIA 19(2). 235–240. <http://dblp.uni-trier.de/db/journals/jamia/jamia19.html#StevensonAS12>10.1136/amiajnl-2011-000415Search in Google Scholar

Vetulani, Z. (ed.). 2011. Proceedings of the 5th Language & Technology Conference: Human Language Technologies as a Challenge for Computer Science and Linguistics Poznań.Search in Google Scholar

Wawer, A. and A. Mykowiecka. 2017. “Supervised and unsupervised word sense dis-ambiguation on word embedding vectors of unambigous synonyms”. Proceedings of the 1st Workshop on Sense, Concept and Entity Representations and Their Applications Valencia, Spain: Association for Computational Linguistics. 120–125.10.18653/v1/W17-1915Search in Google Scholar

Wiriyathammabhum, P., B. Kijsirikul, H. Takamura and M. Okumura. 2012. “Applying deep belief networks to word sense disambiguation”. arXiv Preprint arXiv:1207.0396Search in Google Scholar

Woliński, M., K. Głowińska and M. Świdziński. 2011. “A preliminary version of Składnica – A treebank of Polish”. In: Vetulani, Z. (ed.), Proceedings of the 5th Language & Technology Conference: Human Language Technologies as a Challenge for Computer Science and Linguistics Poznań. 299–303.Search in Google Scholar

Yarowsky, D. 1994. “Decision lists for lexical ambiguity resolution: Application to accent restoration in Spanish and French”. Association for Computational Linguistics 88–95.10.3115/981732.981745Search in Google Scholar

Published Online: 2019-08-17
Published in Print: 2019-06-26

© 2019 Faculty of English, Adam Mickiewicz University, Poznań, Poland

Downloaded on 28.3.2024 from https://www.degruyter.com/document/doi/10.1515/psicl-2019-0013/html
Scroll to top button