Assessing Machine Learning Robustness to Sample Selection Bias

Evaluating the effectiveness of semi-supervised learning techniques

Bachelor Thesis (2023)
Author(s)

V.A.A. Biharie (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Joana Goncalves – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

Y.I. Tepeli – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

Faculty
Electrical Engineering, Mathematics and Computer Science
Copyright
© 2023 Viraj Biharie
More Info
expand_more
Publication Year
2023
Language
English
Copyright
© 2023 Viraj Biharie
Graduation Date
28-06-2023
Awarding Institution
Delft University of Technology
Project
['CSE3000 Research Project']
Programme
['Computer Science and Engineering']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

This paper tackles the problem of sample selection bias in machine learning, where the assumption of train and test sets being drawn from the same distribution is often violated. Existing solutions in domain adaptation, such as semi-supervised learning techniques, aim to correct this bias, but their ability to generalize to unseen test sets remains unexplored. To address this issue, specific semi-supervised methods (self-training and co-training) are trained on biased training sets and tested with an unbiased test set drawn from the same distribution. The results of this paper demonstrate that the semi-supervised methods consistently outperformed or matched the baseline models, with self-training exhibiting greater improvement. Through this study, a promising approach is presented to mitigate sample selection bias in machine learning.

Files

License info not available