Abstract
Automatic word sense disambiguation (WSD) has proven to be an important technique in many natural language processing tasks. For many years the problem of sense disambiguation has been approached with a wide range of methods, however, it is still a challenging problem, especially in the unsupervised setting. One of the well-known and successful approaches to WSD are knowledge-based methods leveraging lexical knowledge resources such as wordnets. As the knowledge-based approaches mostly do not use any labelled training data their performance strongly relies on the structure and the quality of used knowledge sources. However, a pure knowledge-base such as a wordnet cannot reflect all the semantic knowledge necessary to correctly disambiguate word senses in text. In this paper we explore various expansions to plWordNet as knowledge-bases for WSD. Semantic links extracted from a large valency lexicon (Walenty), glosses and usage examples, Wikipedia articles and SUMO ontology are combined with plWordNet and tested in a PageRank-based WSD algorithm. In addition, we analyse also the influence of lexical semantics vector models extracted with the help of the distributional semantics methods. Several new Polish test data sets for WSD are also introduced. All the resources, methods and tools are available on open licences.
9 Acknowledgment
This research was partially funded by the Polish Ministry of Science and Higher Education within CLARIN-PL Research Infrastructure.
References
Agirre, E., O. Lopez de Lacalle and A. Soroa. 2014. “Random walks for knowledge-based word sense disambiguation”. Computational Linguistics 40(1). 57–84.10.1162/COLI_a_00164Search in Google Scholar
Agirre, E. and Aitor Soroa. 2009. “Personalizing Pagerank for word sense disambiguation”. Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics 33–41. EACL ’09. Stroudsburg, PA, USA: Association for Computational Linguistics. <http://dl.acm.org/citation.cfm?id=1609067.1609070>10.3115/1609067.1609070Search in Google Scholar
Baś, D., B. Broda and M. Piasecki. 2008. “Towards word sense disambiguation of Polish”. <http://www.proceedings2008.imcsit.org/pliks/162.pdf>10.1109/IMCSIT.2008.4747220Search in Google Scholar
Bojanowski, P., E. Grave, A. Joulin and T. Mikolov. 2017. “Enriching word vectors with subword information”. Transactions of the Association for Computational Linguistics 5. 135–146.10.1162/tacl_a_00051Search in Google Scholar
Brin, S. and L. Page. 1998. “The anatomy of a large-scale hypertextual web search engine”. Computer Networks and ISDN Systems 30(1–7). 107–117.10.1016/S0169-7552(98)00110-XSearch in Google Scholar
Broda, B., M. Piasecki and S.anisław Szpakowicz. 2010. “Extraction of Polish noun senses from large corpora by means of clustering”. Control and Cybernatics 39. 401–420.Search in Google Scholar
Brown, P.F., S.A. Della Pietra, V.J. Della Pietra and R.L. Mercer. 1991. “Word-sense disambiguation using statistical methods”. Association for Computational Linguistics 264–270. ACL.10.3115/981344.981378Search in Google Scholar
Fellbaum, C. (ed.). 1998. WordNet: An electronic lexical database. (Language, Speech and Communication) Cambridge, MA: The MIT Press.10.7551/mitpress/7287.001.0001Search in Google Scholar
Gale, W.A., K.W. Church and D. Yarowsky. 1992. “A method for disambiguating word senses in a large corpus”. Computers and the Humanities 26(5–6). 415–439.10.1007/BF00136984Search in Google Scholar
Hajnicz, E. 2014. “Lexico-semantic annotation of Składnica treebank by means of PLWN lexical units”. In: Orav, H., C. Fellbaum and P. Vossen (eds.), Proceedings of the 7th International Wordnet Conference (GWC 2014). Tartu, Estonia: University of Tartu. 23–31.Search in Google Scholar
Hajnicz, E., A. Andrzejczuk and T. Bartosiak. 2016. “Semantic layer of the valence dictionary of Polish Walenty”. In: Calzolari, N., K. Choukri, T. Declerck, M. Grobelnik, B. Maegaard, J. Mariani, A. Moreno, J. Odijk and S. Piperidis (eds.), Proceedings of the Tenth International Conference on Language Resources and Evaluation, LREC 2016 Portorož, Slovenia: ELRA; European Language Resources Association (ELRA). 2625–2632. <http://www.lrec-conf.org/proceedings/lrec2016/index.html>Search in Google Scholar
Janz, A., J. Kocoń, M. Piasecki and M. Zaśko-Zielińska. 2017. “plWordNet as a basis for large emotive lexicons of Polish”. In: Vetulani, Z. and P. Paroubek (eds.), Proceedings of Human Language Technologies as a Challenge for Computer Science and Linguistics Poznań: Fundacja Uniwersytetu im. Adama Mickiewicza w Poznaniu. 189–193.Search in Google Scholar
Kędzia, P. and M. Piasecki. 2014. “Rule-based, interlingual motivated mapping of plWordNet onto Sumo Ontology”. In: Calzolari, N., K. Choukri, T. Declerck, M. Grobelnik, B. Maegaard, J. Mariani, A. Moreno, J. Odijk and S. Piperidis (eds.), Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14) Reykjavik, Iceland: European Language Resources Association (ELRA).Search in Google Scholar
Kędzia, P., M. Piasecki and M. Orlińska. 2015. “Word sense disambiguation based on large scale Polish CLARIN heterogeneous lexical resources”. Cognitive Studies / Études Cognitives 15. 269–292. <https://doi.org/10.11649/cs.2015.019>10.11649/cs.2015.019Search in Google Scholar
Lee, Y.K. and H.T. Ng. 2002. “An empirical evaluation of knowledge sources and learning algorithms for word sense disambiguation”. Association for Computational Linguistics 41–48. ACL.10.3115/1118693.1118699Search in Google Scholar
Lesk, M. 1986. “Automatic sense disambiguation using machine readable dictionaries: How to tell a pine cone from an ice cream cone”. ACM Press.10.1145/318723.318728Search in Google Scholar
Li, J. and D. Jurafsky. 2015. “Do multi-sense embeddings improve natural language understanding?” Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. ACL. 1722–1732.10.18653/v1/D15-1200Search in Google Scholar
Marcińczuk, M., J. Kocoń and M. Oleksy. 2017. “Liner2 – A generic framework for named entity recognition”. Valencia, Spain: Association for Computational Linguistics. <http://www.aclweb.org/anthology/W17-1413>10.18653/v1/W17-1413Search in Google Scholar
Maziarz, M. and M. Piasecki. 2018. “Towards mapping thesauri onto plWordNet”. In: Bond, F., C. Fellbaum and P. Vossen (eds.), Proceedings of the Oth Global Wordnet Conference, Singapore, R–12 January 2018 Global WordNet Association.Search in Google Scholar
Maziarz, M., M. Piasecki, J. Rabiega-Wiśniewska and S. Szpakowicz. 2011. “Semantic relations between verbs in Polish Wordnet 2.0”. Cognitive Studies / Études Cognitives 11. 183–200. <https://ispan.waw.pl/journals/index.php/cs-ec/article/view/cs.2011.011>10.11649/cs.2011.011Search in Google Scholar
Maziarz, M., M. Piasecki, E. Rudnicka, S. Szpakowicz and P. Kędzia. 2016. “PlWord-Net 3.0 – A comprehensive lexical-semantic resource”. In: Calzolari, N., Y. Matsumoto and R. Prasad (eds.), COLING 2016, 26th International Conference on Computational Linguistics, Proceedings of the Conference: Technical Papers, December 11–16, 2016, Osaka, Japan ACL. 2259–2268. <http://aclweb.org/anthology/C/C16/>Search in Google Scholar
Mihalcea, R., P. Tarau and E. Figa. 2004. “PageRank on semantic networks, with application to word sense disambiguation”. Proceedings of the 20th International Conference on Computational Linguistics COLING ’04. Stroudsburg, PA: Association for Computational Linguistics. <https://doi.org/10.3115/1220355.1220517>10.3115/1220355.1220517Search in Google Scholar
Moro, A., A. Raganato and R. Navigli. 2014. “Entity linking meets word sense disambiguation: A unified approach”. Transactions of the Association for Computational Linguistics (TACL) S. 231–244.10.1162/tacl_a_00179Search in Google Scholar
Młodzki, R. and A. Przepiórkowski. 2009. “The WSD development environment”. In: Vetulani, Z. (ed.), Human language technology. Challenges for computer science and linguistics. (Lecture Notes in Computer Science) Berlin: Springer. 224–233. <http://dblp.uni-trier.de/db/conf/ltconf/ltconf2009.html#MlodzkiP09>10.1007/978-3-642-20095-3_21Search in Google Scholar
Navigli, R. and S. Paolo Ponzetto. 2012. “BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network”. Artificial Intelligence 193. 217–250.10.1016/j.artint.2012.07.001Search in Google Scholar
Ng, H.T. and H.B. Lee. 1996. “Integrating multiple knowledge sources to disambiguate word sense: An exemplar-based approach”. Association for Computational Linguistics 40–47. ACL.10.3115/981863.981869Search in Google Scholar
Oele, D. and G. van Noord. 2018. “Simple embedding-based word sense disambiguation”. In: Bond, F., C. Fellbaum and P. Vossen (eds.), Proceedings of the Oth Global Wordnet Conference, Singapore, R–12 January 2018 Global WordNet Association.Search in Google Scholar
Orav, H., C. Fellbaum and P. Vossen (eds.). 2014. Proceedings of the 7th International Wordnet Conference (GWC 2014) Tartu, Estonia: University of Tartu.Search in Google Scholar
Pantel, P.A. 2003. Clustering by committee. (PhD dissertation, University of Alberta, Edmonton.)Search in Google Scholar
Patwardhan, S., S. Banerjee and T. Pedersen. 2003. “Using measures of semantic relatedness for word sense disambiguation”. Computational Linguistics and Intelligent Text Processing 2588. 241–257. BerlinSpringer10.1007/3-540-36456-0_24Search in Google Scholar
Pease, A. 2011. Ontology: A practical guide Articulate Software Press.Search in Google Scholar
Peixoto, T. P. 2014. “The Graph-Tool Python Library”. Figshare<https://doi.org/10.6084/m9.figshare.1164194>Search in Google Scholar
Piasecki, M., G. Czachor, A. Janz, D. Kaszewski and P. Kędzia. 2018. “Wordnet-based evaluation of large distributional models for Polish”. In: Bond, F., C. Fellbaum and P. Vossen (eds.), Proceedings of the Oth Global Wordnet Conference, Singapore, 8– 12 January 2018 Global WordNet Association.Search in Google Scholar
Piasecki, M., K. Młynarczyk and J. Kocoń. 2017. “Recognition of genuine Polish suicide notes”. In: Mitkov, R. and G. Angelova (eds.), Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017, Varna, Bulgaria, September S–R, 2017. INCOMA Ltd. 583–591. <https://doi.org/10.26615/978-954-452-049-6_076>10.26615/978-954-452-049-6_076Search in Google Scholar
Piasecki, M., M. Wendelberger and M. Maziarz. 2015. “Extraction of the multi-word lexical units in the perspective of the wordnet expansion.”Search in Google Scholar
Przepiórkowski, A., M. Bańko, R.L. Górski and B. Lewandowska-Tomaszczyk (eds.). 2012. Narodowy Korpus Języka Polskiego Warsaw: Wydawnictwo Naukowe PWN.Search in Google Scholar
Przepiórkowski, A., E. Hajnicz, A. Patejuk, M. Woliński, F. Skwarski and M. Świdziński. 2014. “Walenty: Towards a comprehensive valence dictionary of Polish”. In: Calzolari, N., K. Choukri, T. Declerck, H. Loftsson, B. Maegaard, J. Mariani, A. Moreno, J. Odijk and S. Piperidis (eds.), Proceedings of the Ninth International Conference on Language Resources and Evaluation, LREC 2014. Reykjavı́k, Iceland: ELRA. 2785–2792. <http://www.lrec-conf.org/proceedings/lrec2014/index.html>Search in Google Scholar
Radziszewski, A., A. Wardyński and T. Śniatowski. 2011. “WCCL: A morpho-syntactic feature toolkit”. BerlinSpringer<http://nlp.pwr.wroc.pl/redmine/attachments/361/wccl.pdf>10.1007/978-3-642-23538-2_55Search in Google Scholar
Radziszewski, A. and R. Warzocha. 2014. “WCRFTS. CLARIN-Pl digital repository”. <http:// hdl.handle.net/11321/36>Search in Google Scholar
Raganato, A., J. Camacho-Collados and R. Navigli. 2017. “Word sense disambiguation: A unified evaluation framework and empirical comparison”. Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers U. 99–110.10.18653/v1/E17-1010Search in Google Scholar
Rothe, S. and H. Schütze. 2015. “AutoExtend: Extending word embeddings to embeddings for synsets and lexemes”. The Association for Computer Linguistics. ACL (1). 1793–1803.10.3115/v1/P15-1173Search in Google Scholar
Schapire, R.E. and Y. Singer. 1999. “Improved boosting algorithms using confidence-rated predictions”. Machine Learning 37(3). 297–336.10.1023/A:1007614523901Search in Google Scholar
Stevenson, M., E. Agirre and A. Soroa. 2012. “Exploiting domain information for word sense disambiguation of medical documents”. JAMIA 19(2). 235–240. <http://dblp.uni-trier.de/db/journals/jamia/jamia19.html#StevensonAS12>10.1136/amiajnl-2011-000415Search in Google Scholar
Vetulani, Z. (ed.). 2011. Proceedings of the 5th Language & Technology Conference: Human Language Technologies as a Challenge for Computer Science and Linguistics Poznań.Search in Google Scholar
Wawer, A. and A. Mykowiecka. 2017. “Supervised and unsupervised word sense dis-ambiguation on word embedding vectors of unambigous synonyms”. Proceedings of the 1st Workshop on Sense, Concept and Entity Representations and Their Applications Valencia, Spain: Association for Computational Linguistics. 120–125.10.18653/v1/W17-1915Search in Google Scholar
Wiriyathammabhum, P., B. Kijsirikul, H. Takamura and M. Okumura. 2012. “Applying deep belief networks to word sense disambiguation”. arXiv Preprint arXiv:1207.0396Search in Google Scholar
Woliński, M., K. Głowińska and M. Świdziński. 2011. “A preliminary version of Składnica – A treebank of Polish”. In: Vetulani, Z. (ed.), Proceedings of the 5th Language & Technology Conference: Human Language Technologies as a Challenge for Computer Science and Linguistics Poznań. 299–303.Search in Google Scholar
Yarowsky, D. 1994. “Decision lists for lexical ambiguity resolution: Application to accent restoration in Spanish and French”. Association for Computational Linguistics 88–95.10.3115/981732.981745Search in Google Scholar
© 2019 Faculty of English, Adam Mickiewicz University, Poznań, Poland