Feature selection for spectral clustering: to help or not to help spectral clustering when performing sense discrimination for IR?

Adrian-Gabriel Chifu 1  and Florentina Hristea 2
  • 1 Aix Marseille Université, Université de Toulon, CNRS, LIS,, Marseille, France
  • 2 University of Bucharest,, Bucharest, Romania

Abstract

Whether or not word sense disambiguation (WSD) can improve information retrieval (IR) results represents a topic that has been intensely debated over the years, with many inconclusive or contradictory conclusions. The most rarely used type of WSD for this task is the unsupervised one, although it has been proven to be beneficial at a large scale. Our study builds on existing research and tries to improve the most recent unsupervised method which is based on spectral clustering. It investigates the possible benefits of “helping” spectral clustering through feature selection when it performs sense discrimination for IR. Results obtained so far, involving large data collections, encourage us to point out the importance of feature selection even in the case of this advanced, state of the art clustering technique that is known for performing its own feature weighting. By suggesting an improvement of what we consider the most promising approach to usage of WSD in IR, and by commenting on its possible extensions, we state that WSD still holds a promise for IR and hope to stimulate continuation of this line of research, perhaps at an even more successful level.

If the inline PDF is not rendering correctly, you can download the PDF file here.

  • [1] Resnik P., WSD in NLP applications, In: Agirre E., Edmonds P. (Eds.), Word Sense Disambiguation: Algorithms and Applications, Dordrecht: Springer Netherlands, 2006, 299-337

  • [2] Mothe J., Tanguy L., Linguistic features to predict query difficulty - a case study on previous TREC campaigns, ACM Conference on research and Development in Information Retrieval, SIGIR, Predicting query difficulty - methods and applications workshop, Salvador de Bahia, Brazil, ACM, 2005, 7-10

  • [3] Chifu A.-G., Hristea F., Mothe J., Popescu M., Word sense discrimination in information retrieval: a spectral clustering-based approach, Information Processing & Management, 2015, 51(2), 16-31

  • [4] Tyar S. M., Than M. M., Sense-based information retrieval system by using Jaccard coefficient based WSD algorithm, In: Proceedings of 2015 International Conference on Future Computational Technologies, ICFCT’15, 2015, 197-203

  • [5] Matinfar F., Hybrid sense disambiguation in web queries, Bulletin de la Société Royale des Sciences de Liège, 2016, 85, 1165-1175

  • [6] Stokoe C., Oakes M. P., Tait J., Word sense disambiguation in information retrieval revisited, In: SIGIR, ACM, 2003, 159-166

  • [7] Guyot J., Falquet G., Radhouani S., Benzineb K., Analysis of word sense disambiguation-based information retrieval, In: Peters C., Deselaers T., Ferro N., Gonzalo J., Jones G. J. F., Kurimo M. (Eds.), CLEF, Lecture Notes in Computer Science, Springer, 2008, 5706, 146-154

  • [8] Zhong Z., Ng H. T., Word sense disambiguation improves information retrieval, In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1 (ACL ’12), Association for Computational Linguistics, Stroudsburg, PA, USA, 2012, 273-282

  • [9] Mihalcea R., Moldovan D., Semantic indexing using WordNet senses, In: Proceedings of the ACL-2000 workshop on Recent Advances in Natural Language Processing and Information Retrieval: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 11, Association for Computational Linguistics, Stroudsburg, PA, USA, 2000, 35-45

  • [10] Kim S.-B., Seo H.-C., Rim H.-C., Information retrieval using word senses: root sense tagging approach, In: Proceedings of the 27th annual international ACM SIGIR conference on Research and Development in Information Retrieval, ACM, 2004, 258-265

  • [11] Schütze H., Pedersen J. O., Information retrieval based on word senses, In: Proceedings of the 4th annual Symposium on Document Analysis and Information Retrieval, 1995, 161-175

  • [12] Chifu A.-G., Ionescu R.-T., Word sense disambiguation to improve precision for ambiguous queries, Central European Journal of Computer Science, 2012, 2(4), 398-411

  • [13] Schütze H., Automatic word sense discrimination, Journal of Computational Linguistics, 1998, 24(1), 97-123

  • [14] Luxburg U., A tutorial on spectral clustering, Statistics and Computing, 2997, 17(4), 395-416

  • [15] Hastie T., Tibshirani R., Friedman J., The Elements of Statistical Learning: Data Mining, Inference and Prediction (2nd edition), New York, USA: Springer-Verlag, 2009

  • [16] Popescu M., Hristea F., State of the art versus classical clustering for unsupervised word sense disambiguation, Artificial Intelligence Review, 2011, 35(3), 241-264

  • [17] Maier M., Hein M., Luxburg U., Optimal construction of knearest- neighbor graphs for identifying noisy clusters, Theoretical Computer Science, 2009, 410(19), 1749-1764

  • [18] Màrquez L., Escudero G., Martínez D., Rigau G., Supervised corpus-based methods forWSD, In: Agirre E., Edmonds P. (Eds.), Word Sense Disambiguation, Text, Speech and Language Technology, Springer, Dordrecht, 2007, 33, 167-216

  • [19] Goyal K., Hovy E. H., Unsupervised word sense induction using distributional statistics, In: Hajic J., Tsujii J (Eds.), Proceedings of the 25th International Conference on Computational Linguistics (COLING 2014), Technical Papers, ACL, 2014, 1302-1310

  • [20] Shaw J. A., Fox E. A., Combination of multiple searches, In: Overview of the 3rd Text Retrieval Conference, 1995, 105-108

  • [21] Mothe J., Tanguy L., Linguistic analysis of users’ queries: towards an adaptive information retrieval system, In: International Conference on Signal Image Technology and Internebased Systems (SITIS), South-East European Research Center (SEERC), 2007, 77-84

  • [22] Attar R., Fraenkel A. S., Local feedback in full-text retrieval systems, Journal of the ACM, 1977, 24(3), 397-417

  • [23] Buckley C., Salton G., Allan J., Singhal A., Automatic query expansion using SMART: TREC 3, In: Proceedings of The third Text REtrieval Conference (TREC-3), 1994, 69-80

  • [24] Hristea F., Popescu M., Dumitrescu M., Performing word sense disambiguation at the border between unsupervised and knowledge-based techniques, Artificial Intelligence Review, 2008, 30(1), 67-86

  • [25] Banerjee S., Pedersen T., Extended gloss overlaps as a measure of semantic relatedness, In: Proceedings of the 18th International Joint Conference On Artificial Intelligence, 2003, 805-810

  • [26] Preot,iuc-Pietro D., Hristea F., Unsupervised word sense disambiguation with n-gram features, Artificial Intelligence Review, 2014, 41(2), 241-260.

OPEN ACCESS

Journal + Issues

Open Computer Science is an open access, peer-reviewed journal. The journal publishes research results in the following fields: algorithms and complexity theory, artificial intelligence, bioinformatics, networking and security systems,
programming languages, system and software engineering, and theoretical foundations of computer science.

Search