The influence of the dimensionality on the parameters of the learning curve model

More Info
expand_more

Abstract

Learning curves in machine learning are graphical representations that depict the relationship between a model's performance and the amount of training data it has been exposed to. They play a fundamental role in obtaining the knowledge and skills across a range of domains. Although there are already quite some researches studying machine learning curves, explaining the importance and practical application of learning curves, we still know very little about the factors that influence the parameters of the learning curve. The aim of this research is to give a better understanding of different factors affecting the parameters of the learning curve. Specifically, we are interested in how the dimensionality of a dataset can influence the parameters of the learning curve. Since learning curves are useful and have several applications, such as estimation of the time required to complete production runs, we would like to know if the dimensionality has any effect on the shapes of learning curves. To conduct the research I applied principal component analysis (PCA) three times with different amount of information preserved to reduce number of dimensions on several datasets and analysed the changes in the parameters of the obtained learning curves. The research showed that potentially there might be some relation between dimensionality and shape of the curve, but only in cases of specific machine learning model. The amount of experiments conducted is not sufficient to make solid conclusions and it is advised to continue with proposed experimental setup, but train machine learning models on increased number of datasets.