Improving Existing Optimal Decision Trees Algorithmsby Redefining Their Binarisation Strategy

Bachelor Thesis (2021)
Author(s)

A.K. Wolska (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Emir Demirović – Mentor (TU Delft - Algorithmics)

JA Pouwelse – Graduation committee member (TU Delft - Data-Intensive Systems)

Faculty
Electrical Engineering, Mathematics and Computer Science
Copyright
© 2021 Ola Wolska
More Info
expand_more
Publication Year
2021
Language
English
Copyright
© 2021 Ola Wolska
Graduation Date
02-07-2021
Awarding Institution
Delft University of Technology
Project
['CSE3000 Research Project']
Programme
['Computer Science and Engineering']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Optimal decision trees are not easily improvable in terms of accuracy. However, improving the pre-processing of underlying dataset can be the answer to creating more accurate decision trees. In this paper, multiple methods of binarising datasets are considered and the resulting decision trees compared. The binarisation is divided into two stages: discretisation and encoding, with various algorithms considered for both of the stages. Additionally, processing the data during the decision tree building, referred to as online processing, instead of beforehand, was considered. It was discovered that for smaller datasets, unsupervised discretisation was preferred, and extending one-hot encoding to also consider multiple categories at once as target gave better accuracy for trees with lower depth. For bigger datasets, online processing has shown to be beneficial.

Files

Research_paper_2.pdf
(pdf | 0.56 Mb)
License info not available