Jump to ContentJump to Main Navigation
Show Summary Details
More options …

Journal of Integrative Bioinformatics

Editor-in-Chief: Schreiber, Falk / Hofestädt, Ralf

Managing Editor: Sommer, Björn

Ed. by Baumbach, Jan / Chen, Ming / Orlov, Yuriy / Allmer, Jens

Editorial Board: Giorgetti, Alejandro / Harrison, Andrew / Kochetov, Aleksey / Krüger, Jens / Ma, Qi / Matsuno, Hiroshi / Mitra, Chanchal K. / Pauling, Josch K. / Rawlings, Chris / Fdez-Riverola, Florentino / Romano, Paolo / Röttger, Richard / Shoshi, Alban / Soares, Siomar de Castro / Taubert, Jan / Tauch, Andreas / Yousef, Malik / Weise, Stephan

4 Issues per year


CiteScore 2017: 0.77

SCImago Journal Rank (SJR) 2017: 0.336

Open Access
Online
ISSN
1613-4516
See all formats and pricing
More options …
Volume 8, Issue 3

Issues

Evaluating the effect of unbalanced data in biomedical document classification

Rosalía Laza
  • Corresponding author
  • ESEI, Escuela Superior de Ingeniería Informática, University of Vigo, Edificio Politécnico, Campus Universitario As Lagoas s/n, 32004, Ourense, Spain
  • Email
  • Other articles by this author:
  • De Gruyter OnlineGoogle Scholar
/ Reyes Pavón
  • ESEI, Escuela Superior de Ingeniería Informática, University of Vigo, Edificio Politécnico, Campus Universitario As Lagoas s/n, 32004, Ourense, Spain
  • Other articles by this author:
  • De Gruyter OnlineGoogle Scholar
/ Miguel Reboiro-Jato
  • ESEI, Escuela Superior de Ingeniería Informática, University of Vigo, Edificio Politécnico, Campus Universitario As Lagoas s/n, 32004, Ourense, Spain
  • Other articles by this author:
  • De Gruyter OnlineGoogle Scholar
/ Florentino Fdez-Riverola
  • ESEI, Escuela Superior de Ingeniería Informática, University of Vigo, Edificio Politécnico, Campus Universitario As Lagoas s/n, 32004, Ourense, Spain
  • Other articles by this author:
  • De Gruyter OnlineGoogle Scholar
Published Online: 2016-10-18 | DOI: https://doi.org/10.1515/jib-2011-177

Summary

Nowadays, document classification has become an interesting research field. Partly, this is due to the increasing availability of biomedical information in digital form which is necessary to catalogue and organize. In this context, machine learning techniques are usually applied to text classification by using a general inductive process that automatically builds a text classifier from a set of pre-classified documents. Related with this domain, imbalanced data is a well-known problem in many practical applications of knowledge discovery and its effects on the performance of standard classifiers are remarkable. In this paper, we investigate the application of a Bayesian Network (BN) model for the triage of documents, which are represented by the association of different MeSH terms. Our results show that BNs are adequate for describing conditional independencies between MeSH terms and that MeSH ontology is a valuable resource for representing Medline documents at different abstraction levels. Moreover, we perform an extensive experimental evaluation to investigate if the classification of Medline documents using a BN classifier poses additional challenges when dealing with class-imbalanced prediction. The evaluation involves two methods, under-sampling and cost-sensitive learning. We conclude that BN classifier is sensitive to both balancing strategies and existing techniques can improve its overall performance.

About the article

Published Online: 2016-10-18

Published in Print: 2011-12-01


Citation Information: Journal of Integrative Bioinformatics, Volume 8, Issue 3, Pages 105–117, ISSN (Online) 1613-4516, DOI: https://doi.org/10.1515/jib-2011-177.

Export Citation

© 2011 The Author(s). Published by Journal of Integrative Bioinformatics.. This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. BY-NC-ND 4.0

Comments (0)

Please log in or register to comment.
Log in