The Impact of Imbalanced Training Data on Learning Curve Prior-Fitted Networks

None, None

The Impact of Imbalanced Training Data on Learning Curve Prior-Fitted Networks

Bachelor Thesis (2025)

Author(s)

B. Kostov (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

C. Yan – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

S. Mukherjee – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

T.J. Viering – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

Matthijs T.J. Spaan – Graduation committee member (TU Delft - Sequential Decision Making)

Faculty

Electrical Engineering, Mathematics and Computer Science

Machine Learning Learning Curves Imbalance Data sets

To reference this document use:

https://resolver.tudelft.nl/uuid:ff8d2c84-0548-4071-9110-9fdd5e1f0227

More Info

expand_more

Publication Year

2025

Language

English

Graduation Date

26-06-2025

Awarding Institution

Delft University of Technology

Project

['CSE3000 Research Project']

Programme

['Computer Science and Engineering']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Learning curves represent the relationship between the amount of training data and the error rate in machine learning. An important use case for learning curves is extrapolating them in order to predict how much data is needed to achieve a certain performance. One way to do such extrapolations is using Deep Learning with a Prior-Fitted Network(PFN). This paper explores how training the PFN on an imbalanced dataset, i.e. containing learning curves from two or more machine learning models with a skewed distribution, affects the performance of the network. Research into imbalanced learning has shown that machine learning models can favor the more prevalent classes or data. Therefore, it is worthwhile to explore whether such trends can occur for the neural networks that we train for learning curve extrapolation. Our experiments focused on analyzing different imbalance scenarios and comparing them. Our results show that mixing learning curves from different learners can improve extrapolation performance in some cases, but the effect strongly depends on the learner characteristics and training proportions.

Files

Bachelor_Thesis.pdf

(pdf | 1.04 Mb)

License info not available