Learning Curves

How do Data Imbalances affect the Learning Curves using Nearest Mean Model?

Bachelor Thesis (2024)
Author(s)

J.J. Feng (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

T.J. Viering – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

O.T. Turan – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

Faculty
Electrical Engineering, Mathematics and Computer Science
Copyright
© 2024 Kevin Feng
More Info
expand_more
Publication Year
2024
Language
English
Copyright
© 2024 Kevin Feng
Graduation Date
01-02-2024
Awarding Institution
Delft University of Technology
Project
['CSE3000 Research Project']
Programme
['Computer Science and Engineering']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

This research investigates the impact of data imbalances on the learning curve using the nearest mean model. Learning curves are useful to represent the performance of the model as the training size increases. Imbalanced datasets are often encountered in real-life scenarios and pose challenges to standard classifier models impacting their performance. Thus, the research question is ”How do data imbalances affect the learning curves using the nearest mean model?”. To answer the question, an experiment is conducted using data from a multivariate Gaussian distribution to sample varying levels of imbalances. The imbalance ratio explored is [0.1, 0.2, 0.3, 0.4, 0.5], representing the percentage of the dataset that consists of the minority class. The findings indicated that as the data becomes more imbalanced, the learning curves reach the accuracy plateau at a later rate. The analysis of the curve parameter which follows the logistic function suggests that imbalances have an impact on the maximum achievable accuracy and rightward shift of the curves. However, the maximum achievable accuracy is non-significant and the shape of the curves remains similar. Additionally, false negatives have a significant impact on the learning curves.

Files

5293200_Research_Report.pdf
(pdf | 1.57 Mb)
License info not available