Factors related to dataset that influence the shape of learning curves

None, None

Factors related to dataset that influence the shape of learning curves

Bachelor Thesis (2022)

Author(s)

N.T. Bui (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

T.J. Viering – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Marco Loog – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

G. Smaragdakis – Graduation committee member (TU Delft - Cyber Security)

Faculty

Electrical Engineering, Mathematics and Computer Science

Copyright

Machine learning Noise Learning curve Dimensionality Discretization

To reference this document use:

https://resolver.tudelft.nl/uuid:7921d6fa-b7a3-4fb9-bfdd-cd768da72059

More Info

expand_more

Publication Year

2022

Language

English

Copyright

Graduation Date

23-05-2022

Awarding Institution

Delft University of Technology

Project

['CSE3000 Research Project']

Programme

['Computer Science and Engineering']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Although there are many promising applications of a learning curve in machine learning, such as model selection, we still know very little about what factors influence their behaviours. The aim is to study the impact of the inherent characteristics of the datasets on the learning shapes, which are noise, discretized input and dimensionality. We trained two classifiers with a panoply of datasets for the investigation to see how the learning curve behaves under different circumstances. Firstly, we found that the shapes of the curves varied with different levels of noise injected into the original datasets. Secondly, using the equal width interval binning technique to discretize continuous features did not make the classifiers learn exponentially but caused the learning curves to behave unpredictably; thus, it does not transform the continuous problem into the easier class of problems mentioned in [1]. Finally, the more dimension we reduced using the PCA technique, the learning curve showed strange behaviours.

Files

Final_paper.pdf

(pdf | 3.13 Mb)

License info not available