A Comparative Analysis of Learning Curve Models and their Applicability in Different Scenarios

None, None

A Comparative Analysis of Learning Curve Models and their Applicability in Different Scenarios

Finding datasets patterns which lead to certain parametric curve model

Bachelor Thesis (2023)

Author(s)

A.G. Kalandadze (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

T.J. Viering – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

J.H. Krijthe – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

Zhengjun Yue – Graduation committee member (TU Delft - Multimedia Computing)

Faculty

Electrical Engineering, Mathematics and Computer Science

Copyright

Machine learning Learning curve Outlier detection Curve patterns

To reference this document use:

https://resolver.tudelft.nl/uuid:571d7746-edef-4b20-83dd-5415f78c5c57

More Info

expand_more

Publication Year

2023

Language

English

Copyright

Graduation Date

28-06-2023

Awarding Institution

Delft University of Technology

Project

['CSE3000 Research Project']

Programme

['Computer Science and Engineering']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Learning curves display predictions of the chosen model’s performance for different training set sizes. They can help estimate the amount of data required to achieve a minimal error rate, thus aiding in reducing the cost of data collection. However, our understanding and knowledge of the various shapes of learning curves and their applicability are still insufficient. Despite the presence of a curve that demonstrates a high level of accuracy on average, this parametric model can still exhibit inadequate performance in certain scenarios. Therefore, the objective of this research is to identify specific patterns in the datasets that influence the selection of a particular parametric curve model. To accomplish this, I conduct experiments to assess the performance of different parametric learning curves including power, exponential and Morgan-Mercer-Flodin (mmf) based on the number of features, classes, outliers, and machine learning models. I find that mmf and exponential curves outperform power law for all machine learning models. All curves work best with Logistic Regression, Bernoulli Naive Bayers and Multinomial Naive Bayers models. Exponential and mmf curves provide better results than power law for a small number of classes. Mmf also outperforms power law for the majority of numbers of features and outlier percentages.

Files

AnnaFinalPaper_2_.pdf

(pdf | 0.984 Mb)

License info not available