An Optimized Lesk-Based Algorithm for Word Sense Disambiguation

  • 1 Department of Mathematics & Computer Science, Elizade University,, Ilara Mokin, Nigeria

Abstract

Computational complexity is a characteristic of almost all Lesk-based algorithms for word sense disambiguation (WSD). In this paper, we address this issue by developing a simple and optimized variant of the algorithm using topic composition in documents based on the theory underlying topic models. The knowledge resource adopted is the English WordNet enriched with linguistic knowledge from Wikipedia and Semcor corpus. Besides the algorithm’s eficiency, we also evaluate its efectiveness using two datasets; a general domain dataset and domain-specific dataset. The algorithm achieves a superior performance on the general domain dataset and superior performance for knowledge-based techniques on the domain-specific dataset.

If the inline PDF is not rendering correctly, you can download the PDF file here.

  • [1] Agirre E., Edmonds P., Word Sense Disambiguation: algorithms and applications, Text, Speech and Language Technology, 2006, 33

  • [2] Lesk M., Automatic sense disambiguation using machine readable dictionaries: Howto tell a pine cone from an ice creamcone, In: Proceedings of the 5th ACM-SIGDOC conference, ACM, 1986, 24-26

  • [3] Basile P., Caputo A., Semeraro G., An enhanced Lesk word sense disambiguation algorithm through a distributional semantic model, In: Proceedings of the 25th International Conference on Computational Linguistics: Technical Papers, Dublin, Ireland, August 23-29, 2014, 1591-1600

  • [4] Harris Z., Distributional structure, Word, 1954, 10(23), 146-162

  • [5] Ponzetto S. P., Navigli R., Knowledge-rich word sense disambiguation rivaling supervised systems, In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, Uppsala, Sweden, July 11-16, 2010, 522-1531

  • [6] Miller G. A., Leacock C., Tengi R., Bunker R. T., A semantic concordance, In: Proceedings of the ARPA Workshop on Human Language Technology, 1993, 303-308

  • [7] Cowie J., Guthrie J., Guthrie L., Lexical disambiguation using simulated annealing, In: Proceedings of the 14th Conference on Computational Linguistics (COLING), ACL, 1992, 359-365

  • [8] Banerjee S., Pedersen T., An adapted Lesk algorithm for word sense disambiguation using wordNet, In: Proceedings of the 3rd International Conference on Computational Linguistics and Intelligent Text Processing, 2002, 136-145

  • [9] Banerjee S., Pederson T., Extended gloss overlaps as a measure of semantic relatedness, In: Proceedings of the 18th International Joint Conference on Artiffcial Intelligence, Acapulco, Mexico, August 9-15, 2003, 805-810

  • [10] Kilgarriff A., Rosenzweig J., Framework and results for English SENSEVAL, Computers and the Humanities, 2000, 34(1-2)

  • [11] Ayetiran E. F., Boella G., Caro L. D., Robaldo L., Enhancing word sense disambiguation using a hybrid knowledge-based technique, In: Proceedings of 11th International Workshop on Natural Language Processing and Cognitive Science, 2014, 15-26

  • [12] Ayetiran E. F., Boella G., EBL-hope: Multilingual word sense disambiguation using a hybrid knowledge-based technique, In: Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), Denver, Colorado, June 4-5, 2015, 340-344

  • [13] Jiang J., Conrath D., Semantic similarity based on corpus statistics and lexical taxonomy, In: Proceedings of the 10th International Conference on Research in Computational Linguistics, 1999, 19-33

  • [14] FellbaumC., An electronic lexical database, MIT Press, 1998, ed.

  • [15] Navigli R., Ponzetto S. P., BabelNet: The automatic construction, evaluation and application of a widecoverage multilingual semantic network, Artiffcial Intelligence, 2012, 193(2012), 217-250

  • [16] Blei D. M., Ng A. Y., Jordan M. I., Latent Dirichlet allocation, Journal of Machine Learning Research, 2003, 3, 993-1022

  • [17] Griffiths T. L., Steyvers M., A probabilistic approach to semantic representation, In: Proceedings of the 24th Annual Conference of Cognitive Science Society, 2002

  • [18] Griffiths T. L., Steyvers M., Prediction and semantic association, Advances in Neural Information Processing Systems, 2003, 15, 11-18

  • [19] Griffiths T. L., Steyvers M., Finding scientiffic topics, In: Proceedings of the National Academy of Sciences, 2004, 5228-5235

  • [20] Hoffman T., Probabilistic latent semantic indexing, In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 1999, 50-57

  • [21] Navigli R., Jurgens D., Vannella D., SemEval-2013 Task 12: Multilingual word sense disambiguation, In: Proceedings of the Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: 7th International Workshop on Semantic Evaluation (SemEval 2013), Atlanta, Georgia, June 14-15, 2013, 222-231

  • [22] Agirre E., de Lacalle O. L., Fellbaum C., Hsieh S. K., Tesconi M., Monachini M., Vossen P., Segers R., SemEval-2010 Task 17: Allwords word sense disambiguation on a speciffic domain, In: Proceedings of the 5th International Workshop on Semantic Evaluation, Uppsala, Sweden, July 15-16, 2010, 75-80

  • [23] Kulkarni A., Khapra M. M., Sohoney S., Bhattacharyya P., CFILT: Resource conscious approaches for all-words domain-speciffic WSD, In: Proceedings of the 5th International Workshop on Semantic Evaluation, Uppsala, Sweden, July 15-16, 2010, 421-426.

OPEN ACCESS

Journal + Issues

Open Computer Science is an open access, peer-reviewed journal. The journal publishes research results in the following fields: algorithms and complexity theory, artificial intelligence, bioinformatics, networking and security systems,
programming languages, system and software engineering, and theoretical foundations of computer science.

Search