Benchmarking the effectiveness of domain adaptation techniques in mitigating sample selection bias when leveraging the global domain

None, None

Benchmarking the effectiveness of domain adaptation techniques in mitigating sample selection bias when leveraging the global domain

Master Thesis (2026)

Author(s)

A.C. TOCIU (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Joana Gonçalves – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

Y.I. Tepeli – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

Faculty

Electrical Engineering, Mathematics and Computer Science

Machine learning Domain adaptation Selection bias Global domain Distribution shift

To reference this document use:

https://resolver.tudelft.nl/uuid:4dc0fc16-610f-42e3-aed5-3af8e78c17f2

More Info

expand_more

Publication Year

2026

Language

English

Graduation Date

30-01-2026

Awarding Institution

Delft University of Technology

Programme

['Computer Science']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Sample selection bias is a widespread cause of distribution shift between the train and test sets, which can significantly degrade the generalisability and performance of machine learning models. To mitigate distribution shifts, numerous domain adaptation techniques have been developed, which adapt the train set to the test set. However, adapting to a specific test set under sample selection bias might impede the model from properly generalizing across the entire problem domain and requires re-adaptation whenever the test data changes. Therefore, we propose a novel adaptation strategy,called global domain adaptation, in which we instead adapt to a larger (global) domain representative of the distribution from which both the train and test sets originate. We introduce a comprehensive benchmark to investigate the behavior and limitations of domain adaptation techniques when adapting to the global domain, which consists of synthetic datasets and selection biases as well as complex bioinformatics datasets with intrinsic biases. Our benchmark reveals interesting performance patterns across categories of domain adaptation techniques: minimax estimators are very fragile in practice,while deep domain adaptation has lower stability in spite of increased architectural complexity. Lastly, we find that global domain adaptation is a viable approach for certain techniques such as importance weighting, while semi-supervised techniques tend to perform best for existing test set adaptation.

Files

MSc_Thesis_Andrei_Camil_Tociu_... (pdf)

(pdf | 4.13 Mb)

License info not available