Devising Multivariate Splits for Optimal Decision Trees

More Info
expand_more

Abstract

Decision trees are often desirable for classification/regression tasks thanks to their human-friendly models. Unfortunately, the construction of decision trees is a hard problem which usually implies having to rely on imperfect heuristic methods. Advancements in algorithmics and hardware processing power have rendered globally optimal trees accessible, but even state-of-the-art techniques remain limited in certain aspects. One such limitation is the use of simple, univariate predicates within the decision tree. This paper explores the practicality of further expanding the already exponentially large search space navigated by existing optimal tree algorithms in order to add support for more complex, multivariate predicates. While such an approach comes with profound negative implications on the algorithm's time complexity, it has the potential to create decision trees whose boundaries are closer to ground truth. The results of this work support such a hypothesis, but they also make it clear that further optimisation techniques are critical for multivariate optimal decision tree algorithms, especially as the variety of considered multivariate splits increases.