Finding Robust Optimal Regression Trees using exhaustive search

And why this is not trivial

Bachelor Thesis (2025)
Author(s)

J.C. Vleeschdraager (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Jacobus G.M. van der Linden – Mentor (TU Delft - Algorithmics)

Emir Demirovic – Graduation committee member (TU Delft - Algorithmics)

Jasmijn A. Baaijens – Graduation committee member (TU Delft - Pattern Recognition and Bioinformatics)

Daniël Vos – Graduation committee member (TU Delft - Algorithmics)

Faculty
Electrical Engineering, Mathematics and Computer Science
More Info
expand_more
Publication Year
2025
Language
English
Graduation Date
21-06-2025
Awarding Institution
Delft University of Technology
Project
['CSE3000 Research Project']
Programme
['Computer Science and Engineering']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Decision trees are accurate and interpretable models that can predict classes or values based on features of a data point, but they are vulnerable to small changes to the data that greatly affect the predictions. Previous work has resulted in methods that can find robust optimal classification trees or robust regression trees, but not robust optimal regression trees. Computing optimal decision trees is already NP-hard, and continuous prediction values make robust regression trees more complex than robust classification trees. We analyze in further detail why this is a difficult problem to solve. We introduce ForTree, a method that uses exhaustive search with pruning to find robust optimal regression trees out of a set of trees with limited prediction values. It evaluates all possible configurations of trees with this limitation and stores the tree with the highest adversarial accuracy. We also introduce the partial accuracy as a lower bound for pruning, which we use to speed up the runtime of ForTree. Our experiments show that ForTree achieves up to 1.23 times the adversarial accuracy of state-of-the-art robust regression tree methods, but that the runtime of ForTree is greatly sensitive to the amount of unique feature values in a dataset. We propose multiple approaches to find robust regression trees with higher adversarial accuracy or within less runtime in the future.

Files

License info not available