Optimal Decision Trees for non-linear metrics

None, None

Optimal Decision Trees for non-linear metrics

A geometric convex hull approach

Bachelor Thesis (2024)

Author(s)

B. Bancuta (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

E. Demirović – Mentor (TU Delft - Algorithmics)

J.G.M. van der Linden – Mentor (TU Delft - Algorithmics)

David M.J. Tax – Graduation committee member (TU Delft - Pattern Recognition and Bioinformatics)

Faculty

Electrical Engineering, Mathematics and Computer Science

Optimal Decision Tree Dynamic Programming Convex Hull

To reference this document use:

https://resolver.tudelft.nl/uuid:57010a04-a8a4-45da-b875-3ea589eabe3d

More Info

expand_more

Publication Year

2024

Language

English

Graduation Date

26-06-2024

Awarding Institution

Delft University of Technology

Project

['CSE3000 Research Project']

Programme

['Computer Science and Engineering']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

In the pursuit of employing interpretable and performant Machine Learning models, Decision Trees has become a staple in many industries while being able to produce near-optimal results. With computational power becoming more accessible, there has been increasing progress in constructing Optimal Decision Trees. It guarantees optimal solutions with respect to different metrics within a given size limit on training data while requiring a smaller number of nodes and becoming more viable to compute on real-world data. However, non-linear metrics, which are very effective when evaluating trees on imbalanced datasets, still represent a challenge regarding runtime performance and scalability. Previous approaches generate the Pareto Front of the set of possible solutions, an expensive operation in computing the optimal tree. To address this gap, we introduce a novel merging algorithm of two Pareto Fronts using convex hulls, offering better pruning and leading to an increase in scalability. The experiments show a significant improvement in runtime of almost 10\% on bigger datasets and higher-depth trees using the F1-score metric, with the potential to be applied to other convex metrics.

Files

Final_Thesis_Bogdan_Bancuta.pd... (pdf)

(pdf | 0.473 Mb)

License info not available