P-STreeD

A Multithreaded Approach for DP Optimal Decision Trees

Bachelor Thesis (2024)
Author(s)

A. Sandu (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Emir Demirovic – Mentor (TU Delft - Algorithmics)

Jacobus G.M. van der Linden – Mentor (TU Delft - Algorithmics)

David M.J. Tax – Graduation committee member (TU Delft - Pattern Recognition and Bioinformatics)

Faculty
Electrical Engineering, Mathematics and Computer Science
More Info
expand_more
Publication Year
2024
Language
English
Graduation Date
26-06-2024
Awarding Institution
Delft University of Technology
Project
CSE3000 Research Project
Programme
Computer Science and Engineering
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Decision trees are valued for their ability to logically and transparently classify data. While heuristic methods to compute such trees are efficient, they often compromise on accuracy, prompting interest in Optimal Decision Trees (ODTs), which have the best misclassification score for a given tree size limit and training dataset. The dynamic programming (DP) approach for ODTs has shown improvements over alternatives such as mixed-integer or constraint programming. That being said, it requires further improvements to handle exponential runtime scaling with the depth of the tree and the number of features of the dataset. Leveraging modern hardware, such as multiple CPU cores, offers a promising solution to improve efficiency. This paper proposes a multithreading method for DP ODTs which we apply to STreeD specifically. We introduce a shared memory model and determine which components of the original program can be made local to the threads. We investigate whether it is more efficient to start multithreading at the root of the search tree than near its leaf nodes. We find the former to be superior, resulting in faster runtimes and less shared resource access. Empirical evaluations against the state of the art demonstrate better runtimes, particularly beneficial for large datasets. Finally, thread scaling analyses reveal substantial speed-ups, exceeding 2.5 times with four threads, highlighting our approach's effectiveness for computationally-intensive tasks.

Files

FINAL_PSTREED_SUBMISSION.pdf
(pdf | 0.648 Mb)
License info not available