Impact of Dissimilarity Loss on Out of Distribution Generalization

An introduction of a novel approach for mitigating shortcut learning

Bachelor Thesis (2026)
Author(s)

A.C. Cazacu (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

J.W. Böhmer – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

D.M.J. Tax – Graduation committee member (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Faculty
Electrical Engineering, Mathematics and Computer Science
More Info
expand_more
Publication Year
2026
Language
English
Graduation Date
27-01-2026
Awarding Institution
Delft University of Technology
Project
CSE3000 Research Project
Programme
Computer Science and Engineering
Faculty
Electrical Engineering, Mathematics and Computer Science
Downloads counter
61
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Deep Learning has made neural networks ubiquitous in all kinds of applications. During training, models extract features that are predictive of labels, achieving high accuracy values when tested on in-distribution data. However, issues arise when these extracted features, while indicative in training, do not capture the actual underlying causal features of the data. This reliance on spurious correlations is known as "shortcut learning" and leads to failure to generalize on unseen data. In this paper, we introduce a novel regularizer, dissimilarity loss, which aims to penalize the excessive similarity between representations of samples that share the same spurious predictors. This encourages the model to move beyond shortcut features and learn more robust, task-relevant representations. We show that this additional regularization provides significant benefits to out-of-distribution accuracy compared to a baseline and discuss its drawbacks. Furthermore, we apply it without the spurious feature labels, a regime in which dissimilarity loss still remains effective under distribution shift, and explore other possible directions in which improvements can be made by future work.

Files

RP_Paper_Final.pdf
(pdf | 3.28 Mb)
License info not available