Building Random Forests with Optimal Decision Trees

More Info
expand_more

Abstract

Decision trees are most often made using the heuristic that a series of locally optimal decisions yields a good final decision tree. Optimal decision trees omit this heuristic and exhaustively search - with many optimization techniques - for the best possible tree. In addition, training an ensemble of decision trees with some randomness has proven to outperform a single decision tree. This technique is called random forests. This research brings the techniques of optimal decision trees and random forests together to construct Forests of Optimal Trees. The two most important categories of random forest generation techniques are tree-generation randomization and input-data randomization. Whereas the first is not directly applicable with optimal decision trees as that would disqualify every guarantee of optimality, the latter technique is compatible. In terms of accuracy, Forests of Optimal Trees outperform the heuristic random forests in some cases but are inferior in other cases. This difference is data-dependent. The main disadvantage of Forests of Optimal Trees is that the generation of these forests can be a few orders of magnitude slower than the heuristic forests. Nonetheless, in scenarios where a small gain in classification accuracy can have important advantages, and the cost of the time and computation power are worth it, Forests of Optimal Trees can be useful classifiers.