Jump to ContentJump to Main Navigation
Show Summary Details
More options …

Journal of Intelligent Systems

Editor-in-Chief: Fleyeh, Hasan


CiteScore 2018: 1.03

SCImago Journal Rank (SJR) 2018: 0.188
Source Normalized Impact per Paper (SNIP) 2018: 0.533

Online
ISSN
2191-026X
See all formats and pricing
More options …
Volume 26, Issue 1

Issues

Reducing the Feature Space Using Constraint-Governed Association Rule Mining

Doreswamy
  • Department of Computer Science, Mangalore University, Mangalagangothri, Mangalore, 574199, India
  • Other articles by this author:
  • De Gruyter OnlineGoogle Scholar
/ M. Umme Salma
  • Corresponding author
  • Department of Computer Science, Mangalore University, Mangalagangothri, Mangalore, 574199, India
  • Email
  • Other articles by this author:
  • De Gruyter OnlineGoogle Scholar
Published Online: 2016-02-04 | DOI: https://doi.org/10.1515/jisys-2015-0059

Abstract

Recent advancements in science and technology and advances in the medical field have paved the way for the accumulation of huge amount of medical data in the digital repositories, where they are stored for future endeavors. Mining medical data is the most challenging task as the data are subjected to many social concerns and ethical issues. Moreover, medical data are more illegible as they contain many missing and misleading values and may sometimes be faulty. Thus, pre-processing tasks in medical data mining are of great importance, and the main focus is on feature selection, because the quality of the input determines the quality of the resultant data mining process. This paper provides insight to develop a feature selection process, where a data set subjected to constraint-governed association rule mining and interestingness measures results in a small feature subset capable of producing better classification results. From the results of the experimental study, the feature subset was reduced to more than 50% by applying syntax-governed constraints and dimensionality-governed constraints, and this resulted in a high-quality result. This approach yielded about 98% of classification accuracy for the Breast Cancer Surveillance Consortium (BCSC) data set.

Keywords: Data mining; association rules; constraint-based ARM; interestingness measures; feature selection; SVM

MSC 2010: 68U35

Bibliography

  • [1]

    R. Agrawal and R. Srikant. Fast algorithms for mining association rules, in: Proc. 20th int. conf. very large databases, VLDB, volume 1215, pp. 487–499, 1994.Google Scholar

  • [2]

    Breast cancer surveillance consortium. http://breastscreening.cancer.gov/rfdataset/. Accessed 10 February, 2013.

  • [3]

    H.-L. Chen, B. Yang, J. Liu and D.-Y. Liu, A support vector machine classifier with rough set-based feature selection for breast cancer diagnosis, Expert Syst. Appl. 38 (2011), 9014–9022.Google Scholar

  • [4]

    B. Dash, D. Mishra, A. Rath and M. Acharya, A hybridized k-means clustering approach for high dimensional dataset, Int. J. Eng. Sci. Technol. 2 (2010), 59–66.Google Scholar

  • [5]

    J. Demšar, T. Curk, A. Erjavec, Č. Gorup, T. Hočevar, M. Milutinovič, M. Možina, M. Polajnar, M. Toplak, A. Starič, M. Štajdohar, L. Umek, L. Žagar, J. Žbontar, M. Žitnik and B. Zupan, Orange: data mining toolbox in python, J. Mach. Learn. Res. 14 (2013), 2349–2353.Google Scholar

  • [6]

    J. L. Gastwirth, The estimation of the Lorenz curve and gini index, Rev. Econ. Stat. 54 (1972), 306–316.Google Scholar

  • [7]

    M. L. Gavrilova and M. Gavrilova, Computational Science and Its Applications-ICCSA 2006: Pt. 4: International Conference, Glasgow, UK, May 8–11, 2006, Proceedings, volume 4. Springer Science & Business Media, 2006.Google Scholar

  • [8]

    L. Geng and H. J. Hamilton, Interestingness measures for data mining: a survey. ACM Comput. Surv. (CSUR) 38 (2006), 9.Google Scholar

  • [9]

    S. Gunal and R. Edizkan, Subspace based feature selection for pattern recognition, Inform. Sciences 178 (2008), 3716–3726.Google Scholar

  • [10]

    J. Han and M. Kamber, Data mining, Southeast Asia edition: concepts and techniques, Morgan Kaufmann, 2006.Google Scholar

  • [11]

    O. Inan, M. S. Uzer and N. Ylmaz, A new hybrid feature selection method based on association rules and pca for detection of breast cancer, Int. J. Innovative Comput. Inform. Control 9 (2013), 727–729.Google Scholar

  • [12]

    N. Jiang and L. Gruenwald, Research issues in data stream association rule mining. ACM Sigmod Record 35 (2006), 14–19.Google Scholar

  • [13]

    A. Khemphila and V. Boonjing, Heart disease classification using neural network and feature selection, in: Systems Engineering (ICSEng), 2011 21st International Conference on, pp. 406–409. IEEE, 2011.Google Scholar

  • [14]

    K. Kira and L. A. Rendell, The feature selection problem: traditional methods and a new algorithm, in: AAAI, volume 2, pp. 129–134, 1992.

  • [15]

    J. Liang, F. Wang, C. Dang and Y. Qian, An efficient rough feature selection algorithm with a multi-granulation view, Int. J. Approx. Reason. 53 (2012), 912–926.Google Scholar

  • [16]

    H. Liu and H. Motoda, Feature selection for knowledge discovery and data mining, Springer Science & Business Media, 1998.Google Scholar

  • [17]

    Ljubljana Breast Cancer Dataset. https://archive.ics.uci.edu/ml/datasets/Breast+Cancer. Accessed on 10.02.2013.

  • [18]

    S. Lu, Y. Ye, R. Tsui, H. Su, R. Rexit, S. Wesaratchakit, X. Liu and R. Hwa, Domain ontology-based feature reduction for high dimensional drug data and its application to 30-day heart failure readmission prediction, in: Collaborative Computing: Networking, Applications and Worksharing (Collaboratecom), 2013 9th International Conference Conference on, pp. 478–484. IEEE, 2013.Google Scholar

  • [19]

    Orange Software Open Source. http://orange.biolab.si/. Accessed on 10.02.2013.

  • [20]

    O. O. Odusanya and O. O. Tayo, Breast cancer knowledge, attitudes and practice among nurses in Lagos, Nigeria, Acta Oncol. 40 (2001), 844–848.Google Scholar

  • [21]

    H. Park, S. Kwon and H.-C. Kwon, Complete gini-index text (git) feature-selection algorithm for text classification, in: Software Engineering and Data Mining (SEDM), 2010 2nd International Conference on, pp. 366–371. IEEE, 2010.Google Scholar

  • [22]

    G. Piatetsky-Shapiro, Discovery, analysis and presentation of strong rules, in: Knowledge Discovery in Databases, pp. 229–238, 1991.Google Scholar

  • [23]

    J. Rong, H. Q. Vu, R. Law and G. Li. A behavioral analysis of web sharers and browsers in hong kong using targeted association rule mining, Tourism Manage. 33 (2012), 731–740.Google Scholar

  • [24]

    A. Unler and A. Murat, A discrete particle swarm optimization method for feature selection in binary classification problems. Eur. J. Oper. Res. 206 (2010), 528–539.Google Scholar

  • [25]

    J. W., G. G. Yen and M. M. Polycarpou, Advances in Neural Networks-ISNN 2012: 9th International Symposium on Neural Networks, ISNN 2012, Shenyang, China, July 11-14, 2012: Proceedings. Springer, 2012.Google Scholar

  • [26]

    S. Wang, G. Yu and H. Lu, Advances in Web-Age Information Management: Second International Conference, WAIM 2001, Xi’an, China, July 9-11, 2001. Proceedings, volume 2. Springer Science & Business Media, 2001.Google Scholar

  • [27]

    Wisconsin Breast Cancer Diagnostic. https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Diagnostic). Accessed on 10.02.2013.

  • [28]

    Y. Yao, Y. Chen and X. Yang, A measurement-theoretic foundation of rule interestingness evaluation, in: Foundations and Novel Approaches in Data Mining, pp. 41–59. Springer, Berlin Heidelberg, 2006.Google Scholar

  • [29]

    M. J. Zaki and C.-T. Ho, Large-scale parallel data mining. Number 1759. Springer Science & Business Media, 2000.

  • [30]

    L.-X. Zhang, J.-X. Wang, Y.-N. Zhao and Z.-H. Yang, A novel hybrid feature selection algorithm: using relieff estimation for ga-wrapper search, in: Machine Learning and Cybernetics, 2003 International Conference on, volume 1, pp. 380–384. IEEE, 2003.Google Scholar

About the article

Corresponding author: M. Umme Salma, Department of Computer Science, Mangalore University, Mangalagangothri, Mangalore, 574199, India, e-mail:


Received: 2014-12-31

Published Online: 2016-02-04

Published in Print: 2017-01-01


Funding: Maulana Azad National Fellowship for Minority Students, (Grant/Award Number: ‘F1-17/2013-14/MANF-2013-14-MUS-KAR-24350’).


Citation Information: Journal of Intelligent Systems, Volume 26, Issue 1, Pages 139–152, ISSN (Online) 2191-026X, ISSN (Print) 0334-1860, DOI: https://doi.org/10.1515/jisys-2015-0059.

Export Citation

©2017 Walter de Gruyter GmbH, Berlin/Boston.Get Permission

Comments (0)

Please log in or register to comment.
Log in