Assessing Machine Learning Robustness to Sample Selection Bias

None, None

Assessing Machine Learning Robustness to Sample Selection Bias

Evaluating the effectiveness of semi-supervised learning techniques

Bachelor Thesis (2023)

Author(s)

V.A.A. Biharie (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Joana Goncalves – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

Y.I. Tepeli – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

Faculty

Electrical Engineering, Mathematics and Computer Science

Copyright

To reference this document use:

https://resolver.tudelft.nl/uuid:3df9ffa3-53c4-4036-8f23-491e32471627

More Info

expand_more

Publication Year

2023

Language

English

Copyright

Graduation Date

28-06-2023

Awarding Institution

Delft University of Technology

Project

['CSE3000 Research Project']

Programme

['Computer Science and Engineering']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

This paper tackles the problem of sample selection bias in machine learning, where the assumption of train and test sets being drawn from the same distribution is often violated. Existing solutions in domain adaptation, such as semi-supervised learning techniques, aim to correct this bias, but their ability to generalize to unseen test sets remains unexplored. To address this issue, specific semi-supervised methods (self-training and co-training) are trained on biased training sets and tested with an unbiased test set drawn from the same distribution. The results of this paper demonstrate that the semi-supervised methods consistently outperformed or matched the baseline models, with self-training exhibiting greater improvement. Through this study, a promising approach is presented to mitigate sample selection bias in machine learning.

Files

CSE3000_Final_Paper_Bigger_Fig... (pdf)

(pdf | 2.8 Mb)

License info not available