Exploiting multi-core parallelism on optimal decision trees
Ayush Patandin (TU Delft - Electrical Engineering, Mathematics and Computer Science)
Emir Demirovic – Mentor (TU Delft - Algorithmics)
Johan Pouwelse – Graduation committee member (TU Delft - Data-Intensive Systems)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
This paper presents a study that discusses how multi-threading can be used to improve the runtime performance of constructing optimal classification trees. Decision trees are popular for solving classification or regression problems in machine learning. Heuristic methods are used to build decision tree algorithms that produce models of high accuracy within a short amount of time. An important limitation is that these heuristics locally optimize the decisions of the tree model. Consequently, in recent years, optimal classification tree algorithms have been introduced to strive for global optimality when learning decision trees. Unfortunately, the runtimes for constructing optimal decision trees are quite larger in comparison with the runtimes obtained from heuristic solutions. The study provides a mitigation for this by parallelizing the work of a recently invented optimal decision tree algorithm on multiple cores. There exist different parallel techniques to divide and schedule the work among processors. Our strategy follows the parallel approach that computes optimal decision trees using threads as processing elements in a shared memory space. In the end, we provide the experimental study to show that impressive runtime results of the optimal decision tree algorithm are successfully obtained with the help of the parallelization strategy.