Jump to ContentJump to Main Navigation
Show Summary Details
More options …

Journal of Integrative Bioinformatics

Editor-in-Chief: Schreiber, Falk / Hofestädt, Ralf

Managing Editor: Sommer, Björn

Ed. by Baumbach, Jan / Chen, Ming / Orlov, Yuriy / Allmer, Jens

Editorial Board: Giorgetti, Alejandro / Harrison, Andrew / Kochetov, Aleksey / Krüger, Jens / Ma, Qi / Matsuno, Hiroshi / Mitra, Chanchal K. / Pauling, Josch K. / Rawlings, Chris / Fdez-Riverola, Florentino / Romano, Paolo / Röttger, Richard / Shoshi, Alban / Soares, Siomar de Castro / Taubert, Jan / Tauch, Andreas / Yousef, Malik / Weise, Stephan / Hassani-Pak, Keywan

CiteScore 2017: 0.77

SCImago Journal Rank (SJR) 2017: 0.336

Open Access
See all formats and pricing
More options …
Volume 8, Issue 3


Improving imbalanced scientific text classification using sampling strategies and dictionaries

L. Borrajo / R. Romero / E. L. Iglesias / C. M. Redondo Marey
Published Online: 2016-10-18 | DOI: https://doi.org/10.1515/jib-2011-176


Many real applications have the imbalanced class distribution problem, where one of the classes is represented by a very small number of cases compared to the other classes. One of the systems affected are those related to the recovery and classification of scientific documentation.

Sampling strategies such as Oversampling and Subsampling are popular in tackling the problem of class imbalance. In this work, we study their effects on three types of classifiers (Knn, SVM and Naive-Bayes) when they are applied to search on the PubMed scientific database.

Another purpose of this paper is to study the use of dictionaries in the classification of biomedical texts. Experiments are conducted with three different dictionaries (BioCreative, NLPBA, and an ad-hoc subset of the UniProt database named Protein) using the mentioned classifiers and sampling strategies.

Best results were obtained with NLPBA and Protein dictionaries and the SVM classifier using the Subsampling balancing technique. These results were compared with those ob- tained by other authors using the TREC Genomics 2005 public corpus.

About the article

Published Online: 2016-10-18

Published in Print: 2011-12-01

Citation Information: Journal of Integrative Bioinformatics, Volume 8, Issue 3, Pages 90–104, ISSN (Online) 1613-4516, DOI: https://doi.org/10.1515/jib-2011-176.

Export Citation

© 2011 The Author(s). Published by Journal of Integrative Bioinformatics.. This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. BY-NC-ND 4.0

Citing Articles

Here you can find all Crossref-listed publications in which this article is cited. If you would like to receive automatic email messages as soon as this article is cited in other publications, simply activate the “Citation Alert” on the top of this page.

Jamal, Xianqiao, and Aldabbas
Future Internet, 2019, Volume 11, Number 9, Page 190
Hayda Almeida, Marie-Jean Meurs, Leila Kosseim, and Adrian Tsang
IEEE Transactions on NanoBioscience, 2016, Volume 15, Number 4, Page 354
Hayda Almeida, Marie-Jean Meurs, Leila Kosseim, Greg Butler, Adrian Tsang, and Andrew R. Dalby
PLoS ONE, 2014, Volume 9, Number 12, Page e115892
Feng Tian, Fan Wu, Xiang Fei, Nazaraf Shah, Qinghua Zheng, and Yuanyuan Wang
Service Oriented Computing and Applications, 2019, Volume 13, Number 2, Page 155

Comments (0)

Please log in or register to comment.
Log in