Factors related to dataset that influence the shape of learning curves

Bachelor Thesis (2022)
Author(s)

N.T. Bui (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

T.J. Viering – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

M Loog – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

Georgios Smaragdakis – Graduation committee member (TU Delft - Cyber Security)

Faculty
Electrical Engineering, Mathematics and Computer Science
Copyright
© 2022 NAM THANG Bui
More Info
expand_more
Publication Year
2022
Language
English
Copyright
© 2022 NAM THANG Bui
Graduation Date
23-05-2022
Awarding Institution
Delft University of Technology
Project
CSE3000 Research Project
Programme
Computer Science and Engineering
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Although there are many promising applications of a learning curve in machine learning, such as model selection, we still know very little about what factors influence their behaviours. The aim is to study the impact of the inherent characteristics of the datasets on the learning shapes, which are noise, discretized input and dimensionality. We trained two classifiers with a panoply of datasets for the investigation to see how the learning curve behaves under different circumstances. Firstly, we found that the shapes of the curves varied with different levels of noise injected into the original datasets. Secondly, using the equal width interval binning technique to discretize continuous features did not make the classifiers learn exponentially but caused the learning curves to behave unpredictably; thus, it does not transform the continuous problem into the easier class of problems mentioned in [1]. Finally, the more dimension we reduced using the PCA technique, the learning curve showed strange behaviours.

Files

Final_paper.pdf
(pdf | 3.13 Mb)
License info not available