Jump to ContentJump to Main Navigation

Online

99,00 € / $149.00*

* Prices subject to change. Shipping costs will be added if applicable.
Publication Date:
April 2009
ISSN:
1544-6115
DOI:
10.2202/1544-6115.1422

See all formats and pricing

Online
Individual Subscription Online only
Euro [D] 99.00
RRP for USA, Canada, Mexico
US$ 149.00 *
Print
Individual Subscription Online only
Euro [D] 285.00
RRP for USA, Canada, Mexico
US$ 384.00 *
Print + Online
Individual Subscription Online only
Euro [D] 342.00
RRP for USA, Canada, Mexico
US$ 461.00 *
*Prices subject to change. Shipping costs will be added if applicable.

Editor-in-Chief: Stumpf, Michael P.H.

Editorial Board Member: Beaumont, Mark / Binder, Harald / Gupta, Mayetri / Hubbard, Alan E. / Husmeier, Dirk / Ji, Hongkai / Keles, Sunduz / Kerr, Kathleen / Lazzeroni, Laura / Lin, Shili / Ma, Ping / Marjoram, Paul / Mertens, Bart / Nerman, Olle / G. Petretto, Enrico / Plagnol, Vincent / Purdom, Elizabeth / Robin, Stéphane / Rzhetsky, Andrey / Sanguinetti, Guido / van der Laan, Mark J. / von Haeseler, Arndt / Weeks, Daniel E. / Wiuf, Carsten / Zhao, Hongyu

1 Issue per year

IMPACT FACTOR 2011: 1.517
5-year IMPACT FACTOR: 1.704
Rank 27 out of 116 in category Statistics & Probability in the 2011 Thomson Reuters Journal Citation Report/Science Edition

Balanced Gradient Boosting from Imbalanced Data for Clinical Outcome Prediction

Reiji Teramoto

1Bio-IT Center, NEC Corporation

Citation Information: Statistical Applications in Genetics and Molecular Biology. Volume 8, Issue 1, Pages 1–19, ISSN (Online) 1544-6115, DOI: 10.2202/1544-6115.1422, April 2009

Publication History:
Published Online:
2009-04-07

In clinical outcome prediction, such as disease diagnosis and prognosis, it is often assumed that the class, e.g., disease and control, is equally distributed. However, in practice we often encounter biological or clinical data whose class distribution is highly skewed. Since standard supervised learning algorithms intend to maximize the overall prediction accuracy, a prediction model tends to show a strong bias toward the majority class when it is trained on such imbalanced data. Therefore, the class distribution should be incorporated appropriately to learn from imbalanced data. To address this practically important problem, we proposed balanced gradient boosting (BalaBoost) which reformulates gradient boosting to avoid the overfitting to the majority class and is sensitive to the minority class by making use of the equal class distribution instead of the empirical class distribution. We applied BalaBoost to cancer tissue diagnosis based on miRNA expression data, premature death prediction for diabetes patients based on biochemical and clinical variables and tumor grade prediction of renal cell carcinoma based on tumor marker expressions whose class distribution is highly skewed. Experimental results showed that BalaBoost outperformed the representative supervised learning algorithms, i.e., gradient boosting, Random Forests and Support Vector Machine. Our results led us to the conclusion that BalaBoost is promising for clinical outcome prediction from imbalanced data.

Keywords: clinical outcome; diagnosis; cancer; diabetes; renal cell carcinoma; ensemble learning; boosting; cost-sensitive learning; imbalanced data

Comments (0)

Please log in or register to comment.