The influence of the dimensionality on the parameters of the learning curve model

Bachelor Thesis (2023)
Author(s)

A. Mereuta (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

T.J. Viering – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

J.H. Krijthe – Graduation committee member (TU Delft - Pattern Recognition and Bioinformatics)

Zhengjun Yue – Graduation committee member (TU Delft - Multimedia Computing)

Faculty
Electrical Engineering, Mathematics and Computer Science
Copyright
© 2023 Andrei Mereuta
More Info
expand_more
Publication Year
2023
Language
English
Copyright
© 2023 Andrei Mereuta
Graduation Date
28-06-2023
Awarding Institution
Delft University of Technology
Project
['CSE3000 Research Project']
Programme
['Computer Science and Engineering']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Learning curves in machine learning are graphical representations that depict the relationship between a model's performance and the amount of training data it has been exposed to. They play a fundamental role in obtaining the knowledge and skills across a range of domains. Although there are already quite some researches studying machine learning curves, explaining the importance and practical application of learning curves, we still know very little about the factors that influence the parameters of the learning curve. The aim of this research is to give a better understanding of different factors affecting the parameters of the learning curve. Specifically, we are interested in how the dimensionality of a dataset can influence the parameters of the learning curve. Since learning curves are useful and have several applications, such as estimation of the time required to complete production runs, we would like to know if the dimensionality has any effect on the shapes of learning curves. To conduct the research I applied principal component analysis (PCA) three times with different amount of information preserved to reduce number of dimensions on several datasets and analysed the changes in the parameters of the obtained learning curves. The research showed that potentially there might be some relation between dimensionality and shape of the curve, but only in cases of specific machine learning model. The amount of experiments conducted is not sufficient to make solid conclusions and it is advised to continue with proposed experimental setup, but train machine learning models on increased number of datasets.

Files

License info not available