Devising Multivariate Splits for Optimal Decision Trees

Bachelor Thesis (2021)
Author(s)

A.M. Mălan (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

E. Demirović – Mentor (TU Delft - Algorithmics)

J.A. Pouwelse – Graduation committee member (TU Delft - Data-Intensive Systems)

Faculty
Electrical Engineering, Mathematics and Computer Science
More Info
expand_more
Publication Year
2021
Language
English
Graduation Date
02-07-2021
Awarding Institution
Delft University of Technology
Project
CSE3000 Research Project
Programme
Computer Science and Engineering
Faculty
Electrical Engineering, Mathematics and Computer Science
Downloads counter
195
Collections
thesis
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Decision trees are often desirable for classification/regression tasks thanks to their human-friendly models. Unfortunately, the construction of decision trees is a hard problem which usually implies having to rely on imperfect heuristic methods. Advancements in algorithmics and hardware processing power have rendered globally optimal trees accessible, but even state-of-the-art techniques remain limited in certain aspects. One such limitation is the use of simple, univariate predicates within the decision tree. This paper explores the practicality of further expanding the already exponentially large search space navigated by existing optimal tree algorithms in order to add support for more complex, multivariate predicates. While such an approach comes with profound negative implications on the algorithm's time complexity, it has the potential to create decision trees whose boundaries are closer to ground truth. The results of this work support such a hypothesis, but they also make it clear that further optimisation techniques are critical for multivariate optimal decision tree algorithms, especially as the variety of considered multivariate splits increases.

Files

License info not available