Assessing Machine Learning Robustness to Sample Selection Bias

Evaluating the effectiveness of semi-supervised learning techniques

More Info
expand_more

Abstract

This paper tackles the problem of sample selection bias in machine learning, where the assumption of train and test sets being drawn from the same distribution is often violated. Existing solutions in domain adaptation, such as semi-supervised learning techniques, aim to correct this bias, but their ability to generalize to unseen test sets remains unexplored. To address this issue, specific semi-supervised methods (self-training and co-training) are trained on biased training sets and tested with an unbiased test set drawn from the same distribution. The results of this paper demonstrate that the semi-supervised methods consistently outperformed or matched the baseline models, with self-training exhibiting greater improvement. Through this study, a promising approach is presented to mitigate sample selection bias in machine learning.