Nima Salehi-Moghaddami, Hadi Yazdi, Hanieh Poostchi
June 28, 2011
One of the most commonly used predictive models in classification is the decision tree (DT). The task of a DT is to map observations to target values. In the DT, each branch represents a rule. A rule’s consequent is the leaf of the branch and its antecedent is the conjunction of the features. Most applied algorithms in this field use the concept of Information Entropy and Gini Index as the splitting criterion when building a tree. In this paper, a new splitting criterion to build DTs is proposed. A splitting criterion specifies the tree’s best splitting variable as well as the variable’s threshold for further splitting. Using the idea from classical Forward Selection method and its enhanced versions, the variable having the largest absolute correlation with the target value is chosen as the best splitting variable at each node. Then, the idea of maximizing the margin between classes in a support vector machine (SVM) is used to find the best classification threshold on the selected variable. This procedure will execute recursively at each node, until reaching the leaf nodes. The final decision tree has a shorter height than previous methods, which effectively reduces useless variables and the time needed for classification of future data. Unclassified regions are also generated under the proposed method, which can be interpreted as an advantage or disadvantage. The simulation results demonstrate an improvement in the generated decision tree compared to previous methods.