The effectiveness of subspace mapping techniques adapted to unlabeled samples from a global domain in mitigating sample selection bias

More Info
expand_more

Abstract

Sample selection bias occurs when the selected samples in a subset of the original data set follow a different distribution than the samples from the original data set. This type of bias in the training set could result in a classifier being unable to predict samples from a testing data set optimally. Domain adaptation techniques try to adapt classifiers to a possible bias in the training or testing set. Subspace mapping techniques specifically do this by trying to find common subspaces between the source and target domain, where the source domain is the domain with all samples used for training, and the target domain is the domain with samples that must be predicted. This project aims to evaluate the effectiveness of two subspace mapping techniques in mitigating sample selection bias. This research assumes that no data samples from a target domain are available, but only unlabelled samples coming from an underlying global domain. The two subspace mapping techniques that will be tested in this paper are subspace alignment (SA) and transfer component analysis (TCA). This paper will show that the subspace alignment method is more effective on data sets with fewer features and where the source and target domains are further away from each other. The transfer component analysis method is more effective when more training samples are available on data sets with fewer features and where the distance between the source and target domain is not too big. The effectiveness of both methods also depends on the type and form of the data sets they are used on.

Files