The effectiveness of subspace mapping techniques adapted to unlabeled samples from a global domain in mitigating sample selection bias

None, None

The effectiveness of subspace mapping techniques adapted to unlabeled samples from a global domain in mitigating sample selection bias

Bachelor Thesis (2023)

Author(s)

T.F.R. van Hoorn (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Joana Gonçalves – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Y.I. Tepeli – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

J. Urbano Merino – Graduation committee member (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Faculty

Electrical Engineering, Mathematics and Computer Science

Sample Selection Bias Domain Adaptation Subspace Mapping

To reference this document use

https://resolver.tudelft.nl/uuid:d884af35-fc50-4a50-b0ad-3fe169ffdbe1

More Info

expand_more

Publication Year

2023

Language

English

Graduation Date

28-06-2023

Awarding Institution

Delft University of Technology

Project

CSE3000 Research Project

Programme

Computer Science and Engineering

Faculty

Electrical Engineering, Mathematics and Computer Science

Downloads counter

286

Collections

thesis

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Sample selection bias occurs when the selected samples in a subset of the original data set follow a different distribution than the samples from the original data set. This type of bias in the training set could result in a classifier being unable to predict samples from a testing data set optimally. Domain adaptation techniques try to adapt classifiers to a possible bias in the training or testing set. Subspace mapping techniques specifically do this by trying to find common subspaces between the source and target domain, where the source domain is the domain with all samples used for training, and the target domain is the domain with samples that must be predicted. This project aims to evaluate the effectiveness of two subspace mapping techniques in mitigating sample selection bias. This research assumes that no data samples from a target domain are available, but only unlabelled samples coming from an underlying global domain. The two subspace mapping techniques that will be tested in this paper are subspace alignment (SA) and transfer component analysis (TCA). This paper will show that the subspace alignment method is more effective on data sets with fewer features and where the source and target domains are further away from each other. The transfer component analysis method is more effective when more training samples are available on data sets with fewer features and where the distance between the source and target domain is not too big. The effectiveness of both methods also depends on the type and form of the data sets they are used on.

Files

Final_Paper.pdf

(pdf | 0.612 Mb)

License info not available