Jump to ContentJump to Main Navigation
Show Summary Details
More options …

Open Computer Science

Editor-in-Chief: van den Broek, Egon

Covered by:
Web of Science - Emerging Sources Citation Index

CiteScore 2018: 0.63
Source Normalized Impact per Paper (SNIP) 2018: 0.604

ICV 2018: 97.86

Open Access
See all formats and pricing
More options …

Feature selection for spectral clustering: to help or not to help spectral clustering when performing sense discrimination for IR?

Adrian-Gabriel Chifu / Florentina Hristea
Published Online: 2018-12-31 | DOI: https://doi.org/10.1515/comp-2018-0021


Whether or not word sense disambiguation (WSD) can improve information retrieval (IR) results represents a topic that has been intensely debated over the years, with many inconclusive or contradictory conclusions. The most rarely used type of WSD for this task is the unsupervised one, although it has been proven to be beneficial at a large scale. Our study builds on existing research and tries to improve the most recent unsupervised method which is based on spectral clustering. It investigates the possible benefits of “helping” spectral clustering through feature selection when it performs sense discrimination for IR. Results obtained so far, involving large data collections, encourage us to point out the importance of feature selection even in the case of this advanced, state of the art clustering technique that is known for performing its own feature weighting. By suggesting an improvement of what we consider the most promising approach to usage of WSD in IR, and by commenting on its possible extensions, we state that WSD still holds a promise for IR and hope to stimulate continuation of this line of research, perhaps at an even more successful level.

Keywords: word sense discrimination; information retrieval; query disambiguation; spectral clustering


  • [1] Resnik P., WSD in NLP applications, In: Agirre E., Edmonds P. (Eds.), Word Sense Disambiguation: Algorithms and Applications, Dordrecht: Springer Netherlands, 2006, 299-337Google Scholar

  • [2] Mothe J., Tanguy L., Linguistic features to predict query difficulty - a case study on previous TREC campaigns, ACM Conference on research and Development in Information Retrieval, SIGIR, Predicting query difficulty - methods and applications workshop, Salvador de Bahia, Brazil, ACM, 2005, 7-10Google Scholar

  • [3] Chifu A.-G., Hristea F., Mothe J., Popescu M., Word sense discrimination in information retrieval: a spectral clustering-based approach, Information Processing & Management, 2015, 51(2), 16-31Google Scholar

  • [4] Tyar S. M., Than M. M., Sense-based information retrieval system by using Jaccard coefficient based WSD algorithm, In: Proceedings of 2015 International Conference on Future Computational Technologies, ICFCT’15, 2015, 197-203Google Scholar

  • [5] Matinfar F., Hybrid sense disambiguation in web queries, Bulletin de la Société Royale des Sciences de Liège, 2016, 85, 1165-1175Google Scholar

  • [6] Stokoe C., Oakes M. P., Tait J., Word sense disambiguation in information retrieval revisited, In: SIGIR, ACM, 2003, 159-166Google Scholar

  • [7] Guyot J., Falquet G., Radhouani S., Benzineb K., Analysis of word sense disambiguation-based information retrieval, In: Peters C., Deselaers T., Ferro N., Gonzalo J., Jones G. J. F., Kurimo M. (Eds.), CLEF, Lecture Notes in Computer Science, Springer, 2008, 5706, 146-154Google Scholar

  • [8] Zhong Z., Ng H. T., Word sense disambiguation improves information retrieval, In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1 (ACL ’12), Association for Computational Linguistics, Stroudsburg, PA, USA, 2012, 273-282Google Scholar

  • [9] Mihalcea R., Moldovan D., Semantic indexing using WordNet senses, In: Proceedings of the ACL-2000 workshop on Recent Advances in Natural Language Processing and Information Retrieval: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 11, Association for Computational Linguistics, Stroudsburg, PA, USA, 2000, 35-45Google Scholar

  • [10] Kim S.-B., Seo H.-C., Rim H.-C., Information retrieval using word senses: root sense tagging approach, In: Proceedings of the 27th annual international ACM SIGIR conference on Research and Development in Information Retrieval, ACM, 2004, 258-265Google Scholar

  • [11] Schütze H., Pedersen J. O., Information retrieval based on word senses, In: Proceedings of the 4th annual Symposium on Document Analysis and Information Retrieval, 1995, 161-175Google Scholar

  • [12] Chifu A.-G., Ionescu R.-T., Word sense disambiguation to improve precision for ambiguous queries, Central European Journal of Computer Science, 2012, 2(4), 398-411Google Scholar

  • [13] Schütze H., Automatic word sense discrimination, Journal of Computational Linguistics, 1998, 24(1), 97-123Google Scholar

  • [14] Luxburg U., A tutorial on spectral clustering, Statistics and Computing, 2997, 17(4), 395-416Google Scholar

  • [15] Hastie T., Tibshirani R., Friedman J., The Elements of Statistical Learning: Data Mining, Inference and Prediction (2nd edition), New York, USA: Springer-Verlag, 2009Google Scholar

  • [16] Popescu M., Hristea F., State of the art versus classical clustering for unsupervised word sense disambiguation, Artificial Intelligence Review, 2011, 35(3), 241-264CrossrefWeb of ScienceGoogle Scholar

  • [17] Maier M., Hein M., Luxburg U., Optimal construction of knearest- neighbor graphs for identifying noisy clusters, Theoretical Computer Science, 2009, 410(19), 1749-1764Google Scholar

  • [18] Màrquez L., Escudero G., Martínez D., Rigau G., Supervised corpus-based methods forWSD, In: Agirre E., Edmonds P. (Eds.), Word Sense Disambiguation, Text, Speech and Language Technology, Springer, Dordrecht, 2007, 33, 167-216Google Scholar

  • [19] Goyal K., Hovy E. H., Unsupervised word sense induction using distributional statistics, In: Hajic J., Tsujii J (Eds.), Proceedings of the 25th International Conference on Computational Linguistics (COLING 2014), Technical Papers, ACL, 2014, 1302-1310Google Scholar

  • [20] Shaw J. A., Fox E. A., Combination of multiple searches, In: Overview of the 3rd Text Retrieval Conference, 1995, 105-108Google Scholar

  • [21] Mothe J., Tanguy L., Linguistic analysis of users’ queries: towards an adaptive information retrieval system, In: International Conference on Signal Image Technology and Internebased Systems (SITIS), South-East European Research Center (SEERC), 2007, 77-84Google Scholar

  • [22] Attar R., Fraenkel A. S., Local feedback in full-text retrieval systems, Journal of the ACM, 1977, 24(3), 397-417CrossrefGoogle Scholar

  • [23] Buckley C., Salton G., Allan J., Singhal A., Automatic query expansion using SMART: TREC 3, In: Proceedings of The third Text REtrieval Conference (TREC-3), 1994, 69-80Google Scholar

  • [24] Hristea F., Popescu M., Dumitrescu M., Performing word sense disambiguation at the border between unsupervised and knowledge-based techniques, Artificial Intelligence Review, 2008, 30(1), 67-86CrossrefWeb of ScienceGoogle Scholar

  • [25] Banerjee S., Pedersen T., Extended gloss overlaps as a measure of semantic relatedness, In: Proceedings of the 18th International Joint Conference On Artificial Intelligence, 2003, 805-810Google Scholar

  • [26] Preot,iuc-Pietro D., Hristea F., Unsupervised word sense disambiguation with n-gram features, Artificial Intelligence Review, 2014, 41(2), 241-260.Google Scholar

About the article

Received: 2018-10-21

Accepted: 2018-12-30

Published Online: 2018-12-31

Published in Print: 2018-12-01

Citation Information: Open Computer Science, Volume 8, Issue 1, Pages 218–227, ISSN (Online) 2299-1093, DOI: https://doi.org/10.1515/comp-2018-0021.

Export Citation

© by Adrian-Gabriel Chifu, Florentina Hristea, published by De Gruyter. This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. BY-NC-ND 4.0

Comments (0)

Please log in or register to comment.
Log in