Skip to content
Licensed Unlicensed Requires Authentication Published by De Gruyter April 7, 2009

Balanced Gradient Boosting from Imbalanced Data for Clinical Outcome Prediction

  • Reiji Teramoto

In clinical outcome prediction, such as disease diagnosis and prognosis, it is often assumed that the class, e.g., disease and control, is equally distributed. However, in practice we often encounter biological or clinical data whose class distribution is highly skewed. Since standard supervised learning algorithms intend to maximize the overall prediction accuracy, a prediction model tends to show a strong bias toward the majority class when it is trained on such imbalanced data. Therefore, the class distribution should be incorporated appropriately to learn from imbalanced data. To address this practically important problem, we proposed balanced gradient boosting (BalaBoost) which reformulates gradient boosting to avoid the overfitting to the majority class and is sensitive to the minority class by making use of the equal class distribution instead of the empirical class distribution. We applied BalaBoost to cancer tissue diagnosis based on miRNA expression data, premature death prediction for diabetes patients based on biochemical and clinical variables and tumor grade prediction of renal cell carcinoma based on tumor marker expressions whose class distribution is highly skewed. Experimental results showed that BalaBoost outperformed the representative supervised learning algorithms, i.e., gradient boosting, Random Forests and Support Vector Machine. Our results led us to the conclusion that BalaBoost is promising for clinical outcome prediction from imbalanced data.

Published Online: 2009-4-7

©2011 Walter de Gruyter GmbH & Co. KG, Berlin/Boston

Downloaded on 28.3.2024 from https://www.degruyter.com/document/doi/10.2202/1544-6115.1422/html
Scroll to top button