Learning Curves

None, None

Learning Curves

How do Data Imbalances affect the Learning Curves using Nearest Mean Model?

Bachelor Thesis (2024)

Author(s)

J.J. Feng (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

T.J. Viering – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

O.T. Turan – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

Faculty

Electrical Engineering, Mathematics and Computer Science

Copyright

Machine Learning Learning Curve Imbalance Data sets

To reference this document use:

https://resolver.tudelft.nl/uuid:09603d28-aa2b-48a5-9f61-095fa1084e57

More Info

expand_more

Publication Year

2024

Language

English

Copyright

Graduation Date

01-02-2024

Awarding Institution

Delft University of Technology

Project

['CSE3000 Research Project']

Programme

['Computer Science and Engineering']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

This research investigates the impact of data imbalances on the learning curve using the nearest mean model. Learning curves are useful to represent the performance of the model as the training size increases. Imbalanced datasets are often encountered in real-life scenarios and pose challenges to standard classifier models impacting their performance. Thus, the research question is ”How do data imbalances affect the learning curves using the nearest mean model?”. To answer the question, an experiment is conducted using data from a multivariate Gaussian distribution to sample varying levels of imbalances. The imbalance ratio explored is [0.1, 0.2, 0.3, 0.4, 0.5], representing the percentage of the dataset that consists of the minority class. The findings indicated that as the data becomes more imbalanced, the learning curves reach the accuracy plateau at a later rate. The analysis of the curve parameter which follows the logistic function suggests that imbalances have an impact on the maximum achievable accuracy and rightward shift of the curves. However, the maximum achievable accuracy is non-significant and the shape of the curves remains similar. Additionally, false negatives have a significant impact on the learning curves.

Files

5293200_Research_Report.pdf

(pdf | 1.57 Mb)

License info not available