Jump to ContentJump to Main Navigation
Show Summary Details
More options …

Open Computer Science

Editor-in-Chief: van den Broek, Egon

1 Issue per year

Open Access
Online
ISSN
2299-1093
See all formats and pricing
More options …

Evaluation of text document clustering approach based on particle swarm optimization

Stuti Karol / Veenu Mangat
Published Online: 2013-06-29 | DOI: https://doi.org/10.2478/s13537-013-0104-2

Abstract

Clustering, an extremely important technique in Data Mining is an automatic learning technique aimed at grouping a set of objects into subsets or clusters. The goal is to create clusters that are coherent internally, but substantially different from each other. Text Document Clustering refers to the clustering of related text documents into groups based upon their content. It is a fundamental operation used in unsupervised document organization, text data mining, automatic topic extraction, and information retrieval. Fast and high-quality document clustering algorithms play an important role in effectively navigating, summarizing, and organizing information. The documents to be clustered can be web news articles, abstracts of research papers etc. This paper proposes two techniques for efficient document clustering involving the application of soft computing approach as an intelligent hybrid approach PSO algorithm. The proposed approach involves partitioning Fuzzy C-Means algorithm and K-Means algorithm each hybridized with Particle Swarm Optimization (PSO). The performance of these hybrid algorithms has been evaluated against traditional partitioning techniques (K-Means and Fuzzy C Means).

Keywords: clustering analysis; optimization; swarm intelligence; K-means clustering; fuzzy C-means clustering; particle swarm optimization; text document clustering

  • [1] Abraham A., Das S., Roy, Swarm Intelligence Algorithms for Data Clustering Google Scholar

  • [2] Abraham A., Das S., Konar A., Document Clustering using Differential Evolution, IEEE, 2006 Google Scholar

  • [3] Abraham A., Grosan C., Ramos V., Swarm Intelligence in Data Mining, Stud. Comput. Intell., 34, 2006 Google Scholar

  • [4] Abraham A., Guo H., Liu H., Swarm Intelligence: Foundations, Perspectives and Applications Google Scholar

  • [5] Abraham A., Ramos V., Web using mining using artificial ant colony clustering and linear genetic programming, In: Fifth Congress on Evolutionary Computation (CEC2003), Canberra, Australia, IEEE Press, 1384–1391, 2003 Google Scholar

  • [6] Aliguliyev R.M., Clustering of document collection — A weighting approach, Expert Syst. Appl., 36, 7904–7916, 2009 http://dx.doi.org/10.1016/j.eswa.2008.11.017CrossrefGoogle Scholar

  • [7] Amalabai V., Manimegalai D., An Analysis of Document Clustering algorithms, ICCCT-10, IEEE, 2010 Google Scholar

  • [8] Anaya-Sánchez H., Pons-Porrata A., Berlanga-Llavori R., A document clustering algorithm for discovering and describing topics, Pattern Recognit. Lett., 31, 502–510, 2010 http://dx.doi.org/10.1016/j.patrec.2009.11.013CrossrefGoogle Scholar

  • [9] Anaya-Sánchez H., Pons-Porrata A., Berlanga-Llavori R., A document clustering algorithm for discovering and describing topics, Pattern Recognit. Lett., 31, 502–510, 2010 http://dx.doi.org/10.1016/j.patrec.2009.11.013CrossrefGoogle Scholar

  • [10] Arch-int S., Web document clustering using Semantic Link Analysis, In: Proceedings of the 2005 International Conference on Computational Intelligence for Modelling, Control and Automation, and International Conference on Intelligent Agents, Web Technologies and Internet Commerce (CIMCA-IAWTIC’05) Google Scholar

  • [11] Bae, Xu, Esteva, Facilitating Understanding of Large Document Collections, 2011 International Conference on Document Analysis and Recognition, IEEE, 2011 Google Scholar

  • [12] Baeza-Yates, Ribeiro-Neto, Modern Information Retrieval, 1999 Google Scholar

  • [13] Baghel R., Dhir D., Text Clustering based on Frequent Concept, 1st International Conference on Parallel, Distributed and Grid Computing (PDGC — 2010), IEEE, 2010 Google Scholar

  • [14] Bezdek E., Full, FCM: the fuzzy c-means clustering algorithm, Comput. Geosci., 10, 191–203, 1984 http://dx.doi.org/10.1016/0098-3004(84)90020-7CrossrefGoogle Scholar

  • [15] Brian S. Everitt, Sabine Landau, and Morven Leese, Cluster Analysis, Oxford University Press, 4th edition, 2001 Google Scholar

  • [16] Cheng Y., Ontology based Fuzzy Semantic Clustering, Third 2008 International Conference on Convergence and Hybrid Information Technology, IEEE, 2008 Google Scholar

  • [17] Civicioglu P., Besdok E., A conceptual comparison of the Cuckoo-search, particle swarm optimization, differential evolution and artificial bee colony algorithms, Springer Science and Business Media B.V., 2011 Google Scholar

  • [18] Corne D., Dorigo M., Glover F., New ideas in optimization, McGraw-Hill, USA, 1999 Google Scholar

  • [19] Cui X., Potok T.E., Palathingal P., Document Clustering using Particle Swarm Optimization, IEEE, 2005 Google Scholar

  • [20] Der Merwe van D.W., Engelbrecht A.P., Data clustering using particle swarm optimization, Proceedings of IEEE Congress on Evolutionary Computation, Canberra, Australia, 2003 Google Scholar

  • [21] Eric Bonabeau, Christopher Meyer, Swarm Intelligence: A Whole New Way to Think About Business, Harvard Business Review, 2001 Google Scholar

  • [22] Freeman R., Yin H., Allinson N.M., Self-Organising Maps for Tree View Based Hierarchical Document Clustering, IEEE, 2002 Google Scholar

  • [23] Gu P., Zhu Q., He X., Concept based Text Classification using Labelled and Unlabelled Data, ADMA 2006, LNAI 4093, 652–660, 2006, Springer-Verlag, Berlin, Heidelberg, Germany, 2006 Google Scholar

  • [24] Guha S., Rastogi R., Shim K., ROCK: A robust clustering algorithm for categorical attributes, International Conference on Data Engineering (ICDE’99), 512–521, 1999 Google Scholar

  • [25] Guha S., Rastogi R., Shim K., Cure: An efficient clustering algorithm for large databases, In: Proceedings of 1998 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD’98), 73–84, Seattle, WA, 1998 http://dx.doi.org/10.1145/276304.276312CrossrefGoogle Scholar

  • [26] Han, Kamber, Data Mining Concepts and Techniques, Morgan Kauffman Publishers Google Scholar

  • [27] Hoe K.M., Lai W.K., Tai S.Y., Homogeneous ants for web document similarity modeling and categorization, Third Int. Workshop on Ant Algorithms (ANTS2002), Brussels, Belgium, LNCS 2463, Springer-Verlag, Berlin, Heidelberg, Germany, 256–261, 2002 Google Scholar

  • [28] Hotho A., Maedche A., Staab S., Ontology-based text document clustering, Kunstliche Intelligenz, 16, 48–54, 2002 Google Scholar

  • [29] Ingaramo D., Errecalde M., Cagnina L., Rosso P., Particle Swarm Optimization for clustering short text corpora Google Scholar

  • [30] Jain A.K., Data Clustering: A Review, ACM Computing Surveys, 31, 1999 CrossrefGoogle Scholar

  • [31] Jain A.K., Dubes R.C., Algorithms for Clustering Data, Prentice Hall Advanced Reference Series, 1988 Google Scholar

  • [32] Jing L., Zhou L., Ng M.K., Huang J.Z., Ontology based distance Measure for Text clustering Google Scholar

  • [33] Jo T., Jo G.-S., Table based single pass algorithm for clustering Electronic documents in 20Newsgroup, IEEE International Workshop on Semantic Computing and Applications, IEEE, 2008 Google Scholar

  • [34] Karypis G., Han E.-H., Kumar V., CHAMELEON: A Hierarchical Clustering Algorithm using Dynamic Modelling, COMPUTER, 32, 68–75, 1999 http://dx.doi.org/10.1109/2.781637CrossrefGoogle Scholar

  • [35] Kennedy J., Eberhart R., Particle Swarm Optimization, IEEE, 1995 Google Scholar

  • [36] Kohonen T., Self-organized formation of topologically correct feature maps, Biol. Cybern., 43,1, 59–69, 1982 http://dx.doi.org/10.1007/BF00337288CrossrefGoogle Scholar

  • [37] Kuo R.J., Lin L.M., Application of a hybrid of genetic algorithm and particle swarm optimization algorithm for order clustering, Decis. Support Syst., 49, 451–462, 2010 http://dx.doi.org/10.1016/j.dss.2010.05.006CrossrefGoogle Scholar

  • [38] Lu Y., Wang S., Li S., Zhou C., Text Clustering Via Particle Swarm Optimization, IEEE, 2009 Google Scholar

  • [39] Lu Y., Wang S., Li S., Zhou C., Particle Swarm Optimizer for Variable weighting clustering in high dimensional data, DOI 10.1007/s10994-009-5154-2 CrossrefGoogle Scholar

  • [40] Mahdavi M., Abolhassani H., Harmony K-Means Algorithm for Document Clustering, Data. Min. Knowl. Disc., 18, 370–391, 2009 http://dx.doi.org/10.1007/s10618-008-0123-0CrossrefGoogle Scholar

  • [41] Mahdavi M., Haghir Chehreghani M., Abolhassani H., Forsati R., Novel Meta-heuristic algorithm for clustering Web documents, Appl. Math. Comput., 201, 441–451, 2008 http://dx.doi.org/10.1016/j.amc.2007.12.058CrossrefGoogle Scholar

  • [42] Manning C.D., Raghavan P., Schötze H., An Introduction to Information retrieval, Cambridge University Press Google Scholar

  • [43] Muflikhah L., Baharudin B., Document Clustering using concept space and cosine Similarity measure, International Conference on Computer Technology and Development, 2009 Google Scholar

  • [44] Odukoya O.H, Aderounmu G.A., Adagunodo, E.R., An improved Data clustering algorithm for Mining Web Documents, IEEE, 2010 Google Scholar

  • [45] Oikonomakou N., Vazirgiannis M., A Review of Web document Clustering Approaches Google Scholar

  • [46] Pessiot J.-F., Kim Y.-M., Amini M.R., Gallinari P., Improving Document Clustering in a learned concept space, Inf. Process. Manage., 46, 180–192, 2010 http://dx.doi.org/10.1016/j.ipm.2009.09.007CrossrefGoogle Scholar

  • [47] Porter M.F, An Algorithm for Suffix Stripping, Program, 14, 130–137, 1980 http://dx.doi.org/10.1108/eb046814CrossrefGoogle Scholar

  • [48] Premlatha K., Natrajan A.M., Discrete PSO with GA operators for Document Clustering, Int. J. Recent Trends Eng., 1, 2009 Google Scholar

  • [49] Selim S.Z., Ismail M.A., K-means type algorithms: A generalized convergence theorem and characterization of local optimality, IEEE Trans. Pattern Anal. Mach. Intell. 6, 81–87, 1984 http://dx.doi.org/10.1109/TPAMI.1984.4767478CrossrefGoogle Scholar

  • [50] Shafiei M. et al., Document Representation and Dimension Reduction for Text Clustering, IEEE, 2007 Google Scholar

  • [51] Shyu M.L., Shen M., Rubin S.H., Affinity based Similarity Measure for Document Clustering, IEEE, 2004 Google Scholar

  • [52] Singh V.K., Tiwari N., Garg, Document clustering using K-Means, Heuristic K-Means and Fuzzy-CMeans, 2011 International Conference on Computational Intelligence and Communication Systems, IEEE, 2011 Google Scholar

  • [53] Smeaton A.F., Burnett M., Crimmins F., Quinn G., An architecture for Efficient Document Clustering and Retrieval on a Dynamic Collection of Newspaper Texts, 20th BCS-IRSG Colloquium on Information Retrieval, 1998 Google Scholar

  • [54] http://snowball.tartarus.org Google Scholar

  • [55] Song L., Ma J., Yan P., Lian L., Zhang D., Clustering Deep Web Databases Semantically, Springer Verlag Google Scholar

  • [56] Sridevi U.K., Nagaveni N., Ontology based Similarity Measures in Document Similarity Ranking, International Conference on Advances in Recent Technologies in Communication and Computing, IEEE, 2009 Google Scholar

  • [57] Sridevi U.K., Nagaveni N., Semantically Enhanced Document Clustering Based on PSO Algorithm, Eur. J. European Journal Sci. Res., 57, 485–493, 2011 Google Scholar

  • [58] Steinbach M., Karypis G., Kumar V., A Comparison of document Clustering Techniques Google Scholar

  • [59] Strehl A., Ghosh J., Impact of Similarity Measures on Web page clustering, Raymond Mooney, AAAI Technical Report WS-00-01, 2000 Google Scholar

  • [60] Subhashini R., Jawahar V., Kumar S., Evaluating the Performance of Similarity Measures Used in Document Clustering and Information Retrieval, First International Conference on Integrated Intelligent Computing, IEEE, 2010 Google Scholar

  • [61] Thangamani M., Thangaraj P., Survey on Text document Clustering, Int. J. Comput. Sci. Inf. Secur., 8, 2010 Google Scholar

  • [62] Thangaraj R., Pant M., Abraham A., Bouvry P., Particle swarm optimization: Hybridization perspectives and experimental illustrations, Appl. Math. Comput., 2011 CrossrefGoogle Scholar

  • [63] Tjhi W-C., Chen L., Fuzzy Co-Clustering of Web Documents, In: Proceedings of the 2005 International Conference on Cyberworlds (CW’05) Google Scholar

  • [64] Treerattanapitak K., Jaruskulchai C., Wong K.W. et al. (Eds.), Membership Enhancement with Exponential Fuzzy Clustering for Collaborative Filtering, ICONIP 2010, Part I, LNCS, 6443, 559–566, 2010 Google Scholar

  • [65] Wang Z., Liu Z., Chen D., Tang K., A New Partitioning based algorithm for Document clustering, 2011 Eighth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD), IEEE, 2011 Google Scholar

  • [66] Yongxin L., Zhijng L., An improved hierarchical K-means algorithm for web document clustering, International Conference on Computer Science and Information Technology, 2008 Google Scholar

  • [67] Zhang T., Ramakrishnan R., Livny M., BIRCH: An Efficient Data Clustering Method for very Large Databases. In: Proceedings of the 1996 ACM SIGMOD international Conference on Management of Data, Montreal, Quebec, Canada, June 04–06, 1996 Google Scholar

  • [68] Zhang X., Jing L., Hu X., Ng M., Zhou X., A Comparative study of Ontology based Term Similarity Measures on PubMed Document Clustering Google Scholar

  • [69] Zhang Z., Cheng H., Zhang S., Chen W., Fang Q., Clustering aggregation based on genetic algorithm for document clustering, Evol. Comput., 2008 Google Scholar

  • [70] Zhao Y., Karypis G., Criteria functions for Document Clustering Experiments and Analysis, University of Minnesota, Army HPC Research Centre, 2001 Google Scholar

About the article

Published Online: 2013-06-29

Published in Print: 2013-06-01


Citation Information: Open Computer Science, ISSN (Online) 2299-1093, DOI: https://doi.org/10.2478/s13537-013-0104-2.

Export Citation

© 2013 Versita Warsaw. This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License. BY-NC-ND 3.0

Citing Articles

Here you can find all Crossref-listed publications in which this article is cited. If you would like to receive automatic email messages as soon as this article is cited in other publications, simply activate the “Citation Alert” on the top of this page.

[1]
Laith Mohammad Abualigah, Ahamad Tajudin Khader, Essam Said Hanandeh, and Amir H. Gandomi
Applied Soft Computing, 2017, Volume 60, Page 423

Comments (0)

Please log in or register to comment.
Log in