degruyter.com uses cookies to store information that enables us to optimize our website and make browsing more comfortable for you. To learn more about the use of cookies, please read our Privacy Policy. OK

HingeBoost: ROC-Based Boost for Classification and Variable Selection

Zhu Wang
  • Connecticut Children’s Medical Center and University of Connecticut School of Medicine

In disease classification, a traditional technique is the receiver operative characteristic (ROC) curve and the area under the curve (AUC). With high-dimensional data, the ROC techniques are needed to conduct classification and variable selection. The current ROC methods do not explicitly incorporate unequal misclassification costs or do not have a theoretical grounding for optimizing the AUC. Empirical studies in the literature have demonstrated that optimizing the hinge loss can maximize the AUC approximately. In theory, minimizing the hinge rank loss is equivalent to minimizing the AUC in the asymptotic limit. In this article, we propose a novel nonparametric method HingeBoost to optimize a weighted hinge loss incorporating misclassification costs. HingeBoost can be used to construct linear and nonlinear classifiers. The estimation and variable selection for the hinge loss are addressed by a new boosting algorithm. Furthermore, the proposed twin HingeBoost can select more sparse predictors. Some properties of HingeBoost are studied as well. To compare HingeBoost with existing classification methods, we present empirical study results using data from simulations and a prostate cancer study with mass spectrometry-based proteomics.

If the inline PDF is not rendering correctly, you can download the PDF file here.

FREE ACCESS

Journal + Issues

IJB publishes biostatistical models and methods, statistical theory, as well as original applications of statistical methods, for important practical problems arising from various sciences. It covers the entire range of biostatistics, from theoretical advances to relevant and sensible translations of a practical problem into a statistical framework, including advances in biostatistical computing.

Search