Jump to ContentJump to Main Navigation
Show Summary Details
More options …

Information Technology and Management Science

The Journal of Riga Technical University

1 Issue per year

Open Access
Online
ISSN
2255-9094
See all formats and pricing
More options …

The Use of BEXA Family Algorithms in Bioinformatics Data Classification

Madara Gasparoviсa / Ludmila Aleksejeva / Valdis Gersons
Published Online: 2013-01-31 | DOI: https://doi.org/10.2478/v10313-012-0018-3

Abstract

This article studies the possibilities of BEXA family classification algorithms - BEXA, FuzzyBexa and FuzzyBexa II in data, especially bioinformatics data, classification. Three different types of data sets have been used in the study - data sets often used in the literature, UCI data repository real life data sets and real bioinformatics data sets that have the specific character - a large number of attributes and a small number of records. For the comparison of classification results experiments have been carried out using all data sets and other classification algorithms. As a result, conclusions have been drawn and recommendations given about the use of each algorithm of BEXA family for classification of various real data, as well as an answer has been given to the question, whether the use of these algorithms is recommended for bioinformatics data.

Šajā rakstā pētītas Bexa saimes algoritmu iespējas reālu bioinformātikas datu klasifikācijā. Bexa saime sastāv no trim algoritmiem: Bexa - kas darbojas ar stingriem datiem, kā arī FuzzyBexa un FuzzyBexa II, kas darbojas ar izplūdušiem datiem. FuzzyBexa no FuzzyBexaII atšķiras ar to, ka pēdējā katra klase netiek apskatīta individuāli, bet gan tiek ģenerēti likumi visām klasēm. Bexa saimes algoritmi nosacīti sastāv no trim daļām -pārklājuma procedūras, labākā likuma meklēšanas, izmantojot novērtējuma funkciju, kā arī specializāciju veidošanas. Praktiskie eksperimenti tika veikti ar sešpadsmit reālām datu kopām, kuras nosacīti var iedalīt trīs daļās: literatūrā bieži izmantotās datu kopas (Iris data set, Auto MPG and Ionosphere Data Set ), UCI datu repozitorija reālas bioinformātikas datu kopas (Nursery Data Set, Breast cancer Wisconsin, Parkinsons, SPECT heart, Molecular biology (Splice-junction gene sequences), Yeast data set) un reālas bioinformātikas datu kopas, kam ir liels atribūtu un mazs ierakstu skaits (GSE3726 (Breast & colon cancer), GSE2535 (CML treatment), GSE2685 (Gastric cancer), GSE1577 (Lymphoma & Leukaemia), GSE2191 (AML prognosis), GSE89 (Bladder cancer) and GSE1987 (Lung cancer)). Lai salīdzinātu Bexa saimes algoritmu klasifikācijas rezultātus, tika veikti papildus eksperimenti ar visām izmantotajām datu kopām ar citiem algoritmiem: Bexa klasifikācijas rezultāts kategoriskiem datiem salīdzināts ar JRIP, Part un PRISMA algoritmiem, kā arī ar skaitliskiem datiem ar Jrip un Part. FuzzyBexa un FuzzyBexaII klasifikācijas rezultāti salīdzināti ar FURIA, FLR un Slave C algoritmiem. Pēc klasifikācijas rezultātiem izdarīti secinājumi par atsevišķu kritēriju ietekmi uz iegūto klasifikācijas rezultātu. Pēc rezultātiem redzams, ka šīs saimes algoritmu izmantošana bioinformātikā ir perspektīva un nepieciešami tālāki pētījumi par iespējām uzlabot algoritmu vājās puses, lai paaugstinātu to klasifikācijas precizitāti un iegūto likumu kvalitāti.

В данной статье исследуются возможности алгоритмов семейства Bexa для классификации реальных данных биоинформатики. Семейство Bexa состоит из трёх алгоритмов: Bexa - который работает с чёткими данными, а также FuzzyBexa и FuzzyBexa II, которые работают с нечёткими данными. FuzzyBexa отличается от FuzzyBexa II тем, что в последнем каждый класс не рассматривается индивидуально, но генерируются законы для всех классов. Алгоритмы семейства Bexa условно состоят из трёх частей: процедуры перекрытия, поиска лучшего закона, используя оценочную функцию, а также образования специализаций. Практические эксперименты проводились на шестнадцати реальных множествах данных, которые условно можно разделить на три части: часто используемые в литературе множества данных (Iris data set, Auto MPG и Ionosphere Data Set), реальные множества данных биоинформатики из репозитория данных UCI (Nursery Data Set, Breast cancer Wisconsin, Parkinsons, SPECT heart, Molecular biology (Splice-junction gene sequences), Yeast data set) и реальные множества данных биоинформатики, у которых большое количество атрибутов и маленькое количество записей (GSE3726 (Breast & colon cancer), GSE2535 (CML treatment), GSE2685 (Gastric cancer), GSE1577 (Lymphoma & Leukaemia), GSE2191 (AML prognosis), GSE89 (Bladder cancer) и GSE1987 (Lung cancer)). Чтобы сравнить результаты классификации алгоритмов семейства Bexa, были проведены дополнительные эксперименты на всех использованных множествах данных с другими алгоритмами: результат классификации Bexa для категорийных данных сравнён с алгоритмами JRIP, Part и PRISMA, а также для численных данных - с Jrip и Part. FuzzyBexa и FuzzyBexaII сравнены с алгоритмами FURIA, FLR и Slave C. По результатам классификации были сделаны выводы о влиянии отдельных критериев на полученный результат классификации. Исходя из полученных результатов классификации видно, что использование данного семейства алгоритмов в биоинформатике является перспективным, и необходимы дальнейшие исследования в контексте возможностей улучшить слабые стороны этих алгоритмов с целью повысить их точность классификации и качество полученных законов.

Keywords : classification algorithms; bioinformatics data; BEXA; UCI data

  • [1] H. Theron, I. Cloete, BEXA: A Covering Algorithm for Learning Propositional Concept Descriptions, in Machine Learning, Vol. 24, Boston: Kluwer Academic Publishers, 1996, pp.5-40.Google Scholar

  • [2] J. van Zyl, I.Cloete, FuzzConRi - A Fuzzy Conjunctive Rule Inducer, in Proc. Workshop on Advances in Inductive Rule Learning, ECML, 2004, pp.194-203.Google Scholar

  • [3] J. van Zyl, I.Cloete, Simultaneous Concept Learning of Fuzzy Rules, in Proc. Workshop on Advances in Inductive Rule Learning, CCML, 2004, pp.194-203.Google Scholar

  • [4] P. Clark. The CN2 Induction Algorithm / Clark P. and Niblett T. // Machine Learning. Vol. 3, 1989, pp. 261-283.Google Scholar

  • [5] J. Hong. AQ15: Incremental Learning of Attribute-Based Descriptions from Examples the Method and User Guide. Report of the Intelligent Systems Group, UIUCDCS-F-86-949 Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, 1986.Google Scholar

  • [6] J. Zyl. Fuzzy set covering as a new paradigm for the induction of fuzzy classification rules. - Mannheim: PhD thesis, 2007. p 263.Google Scholar

  • [7] A. Frank, A. Asuncion, UCI Machine Learning Repository Irvine, CA: University of California, School of Information and Computer Science. 2010. [Online] Available: http://archive.ics.uci.edu/ml]. [Accessed: June 3, 2012]Google Scholar

  • [8] M. Gasparoviča M., L. Aleksejeva. Feature Selection for Bioinformatics Data Sets - Is It Recommended? // Proceedings of the 5th International Conference on Applied Information and Communication Technologies (AICT2012), Latvia, Jelgava, 26.-27. April, 2012. - pp 325-335.Google Scholar

  • [9] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, I.H.Witten.: The WEKA Data Mining Software: An Update. SIGKDD Explorations. 11:1, 2009, pp. 10-18.Google Scholar

  • [10] J. Alcalá-Fdez,, A. Fernandez, J. Luengo, J. Derrac, S. García, L. Sánchez,, F. Herrera: KEEL Data-Mining Software Tool: Data Set Repository, Integration of Algorithms and Experimental Analysis Framework. Journal of Multiple-Valued Logic and Soft Computing. 17:2-3, 2011, pp.55-287.Google Scholar

About the article

Madara Gasparoviсa

Madara Gasparoviča received her diploma of Mg. sc. ing. in Information Technology from Riga Technical University in 2010. Now she is a doctoral student at the study programme “Information Technology”, Riga Technical University. Since 2008 she has worked as a Senior Laboratory Assistant at Riga Technical University, and since 2010 she has been working as a Researcher at the Department of Modelling and Simulation, the Institute of Information Technology. Previous publications: Gasparovica M., Novoselova N., Aleksejeva L., Using Fuzzy Logic to Solve Bioinformatics Tasks, Proceedings of Riga Technical University. Issue 5, Computer Science. Information Technology and Management Science, Vol.44, 2010, pp.99-105. Gasparoviča M., Aleksejeva L. Using Fuzzy Unordered Rule Induction Algorithm for Cancer Data Classification, Proceedings of the 17th International Conference on Soft Computing, MENDEL 2011, Czech Republic, Brno, June 15-17, 2011, pp. 141-147. Her interests include decision support systems, data mining tasks and modular rules. She is a member of IEEE. Address: 1 Kalku Street, LV-1658, Riga, Latvia.

Ludmila Aleksejeva

Ludmila Aleksejeva received her Dr. sc. ing. degree from Riga Technical University in 1998. She is an Associate Professor at the Department of Modelling and Simulation, Riga Technical University. Her research interests include decision making techniques and decision support system design principles, as well as data mining methods and tasks, and especially collaboration and cooperation of the mentioned techniques.Most important previous publications: Gasparoviča M., Novoselova N., Aleksejeva L., Using Fuzzy Logic to Solve Bioinformatics Tasks, Proceedings of Riga Technical University. Issue 5, Computer Science. Information Technology and Management Science, Vol.44, 2010, pp.99-105. Gasparoviča M., Aleksejeva L., Tuleiko I. Finding Membership Functions for Bioinformatics Data // Proceedings of the 17th International Conference on Soft Computing, MENDEL 2011, Czech, Brno, June 15-17, 2011.pp. 133-140. Address: 1 Kalku Street, LV-1658, Riga, Latvia

Valdis Gersons

Valdis Gersons received his Bachelor Degree in Information Technology from Riga Technical University in 2012. He elaborated his Bachelor Thesis on the inductive methods in bioinformatics data classification. Address: 1 Kalku Street, LV-1658, Riga, Latvia.


Published Online: 2013-01-31

Published in Print: 2012-12-01


Citation Information: Information Technology and Management Science, Volume 15, Issue 1, Pages 120–126, ISSN (Online) 2255-9094, ISSN (Print) 2255-9086, DOI: https://doi.org/10.2478/v10313-012-0018-3.

Export Citation

This content is open access.

Comments (0)

Please log in or register to comment.
Log in