How does imbalanced data affect performance of regression CNNs?

Bachelor thesis (2021)

Authors

R.K. Thakoersingh Electrical Engineering, Mathematics and Computer Science

Contributors

T.J. Viering Computer Science & Engineering-Teaching Team - (mentor)

Y. Kato Pattern Recognition and Bioinformatics - (mentor)

M. Loog Pattern Recognition and Bioinformatics - (mentor)

D.M.J. Tax Pattern Recognition and Bioinformatics - (mentor)

K.A. Hildebrandt Computer Graphics and Visualisation - (coach)

Faculty

Electrical Engineering, Mathematics and Computer Science, Electrical Engineering, Mathematics and Computer Science

To reference this document use:

http://resolver.tudelft.nl/uuid:502ff06e-00df-4ffc-8ee5-c22b1cedf92d

More Info

expand_more

Published Date

02-07-2021

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Electrical Engineering, Mathematics and Computer Science

Abstract

This research provides an overview on how training Convolutional Neural Networks (CNNs) on imbalanced datasets affect the performance of the CNNs. Datasets could be imbalanced as a result of several reasons. There are for example naturally less samples of rare diseases. Since the network is trained less on those instances, this might lead to worse performance on those cases. However, it might be more crucial to identify those cases properly. Furthermore, it is non-trivial to check whether real-time generated data is balanced. The networks in this research are trained on three different types of synthetic datasets. Balanced datasets, datasets with missing targets and datasets that have normally distributed targets. The task of the network is to find the standard deviation of the pixel intensity of the input. The results show that it is best to train the network on balanced datasets, however training networks on datasets with normally distributed targets does not result in a big loss. Furthermore, in this case the CNNs were still able to learn the task with decent performance if the training set missed targets.

Files

Research_Project_2020.pdf

(pdf | 0.433 Mb)