P-STreeD

A Multithreaded Approach for DP Optimal Decision Trees

More Info
expand_more

Abstract

Decision trees are valued for their ability to logically and transparently classify data. While heuristic methods to compute such trees are efficient, they often compromise on accuracy, prompting interest in Optimal Decision Trees (ODTs), which have the best misclassification score for a given tree size limit and training dataset. The dynamic programming (DP) approach for ODTs has shown improvements over alternatives such as mixed-integer or constraint programming. That being said, it requires further improvements to handle exponential runtime scaling with the depth of the tree and the number of features of the dataset. Leveraging modern hardware, such as multiple CPU cores, offers a promising solution to improve efficiency. This paper proposes a multithreading method for DP ODTs which we apply to STreeD specifically. We introduce a shared memory model and determine which components of the original program can be made local to the threads. We investigate whether it is more efficient to start multithreading at the root of the search tree than near its leaf nodes. We find the former to be superior, resulting in faster runtimes and less shared resource access. Empirical evaluations against the state of the art demonstrate better runtimes, particularly beneficial for large datasets. Finally, thread scaling analyses reveal substantial speed-ups, exceeding 2.5 times with four threads, highlighting our approach's effectiveness for computationally-intensive tasks.