Building Random Forests with Optimal Decision Trees

Bachelor thesis (2021)

Authors

J. Molhoek Electrical Engineering, Mathematics and Computer Science

Contributors

E. Demirović Algorithmics - (mentor)

J.A. Pouwelse Data-Intensive Systems - (graduation committee member)

Faculty

Electrical Engineering, Mathematics and Computer Science, Electrical Engineering, Mathematics and Computer Science

To reference this document use:

http://resolver.tudelft.nl/uuid:dfca647f-3d19-40a6-a880-46fc9260bbc9

More Info

expand_more

Published Date

25-06-2021

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Electrical Engineering, Mathematics and Computer Science

Abstract

Decision trees are most often made using the heuristic that a series of locally optimal decisions yields a good final decision tree. Optimal decision trees omit this heuristic and exhaustively search - with many optimization techniques - for the best possible tree. In addition, training an ensemble of decision trees with some randomness has proven to outperform a single decision tree. This technique is called random forests. This research brings the techniques of optimal decision trees and random forests together to construct Forests of Optimal Trees. The two most important categories of random forest generation techniques are tree-generation randomization and input-data randomization. Whereas the first is not directly applicable with optimal decision trees as that would disqualify every guarantee of optimality, the latter technique is compatible. In terms of accuracy, Forests of Optimal Trees outperform the heuristic random forests in some cases but are inferior in other cases. This difference is data-dependent. The main disadvantage of Forests of Optimal Trees is that the generation of these forests can be a few orders of magnitude slower than the heuristic forests. Nonetheless, in scenarios where a small gain in classification accuracy can have important advantages, and the cost of the time and computation power are worth it, Forests of Optimal Trees can be useful classifiers.

Files

Report_Jord_Molhoek.pdf

(.pdf | 0.499 Mb)