Understanding the Value of Depth: RGB-D Fusion and Pseudo-Depth for Robust Out-of-Distribution Generalisation

An Experimental Journey into How Depth Shapes Generalisation in Vision Models

Master Thesis (2026)
Author(s)

Alexandra Neagu (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

J.C. van Gemert – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

C.E. Brandt – Graduation committee member (TU Delft - Software Engineering)

A.S. Gielisse – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

Faculty
Electrical Engineering, Mathematics and Computer Science
More Info
expand_more
Publication Year
2026
Language
English
Graduation Date
12-02-2026
Awarding Institution
Delft University of Technology
Programme
['Computer Science | Artificial Intelligence']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Convolutional neural networks (CNNs) trained on RGB images (red, green, blue channels) often exhibit sharp performance degradation under distribution shifts, as they tend to rely on superficial appearance cues such as background or texture. While depth information is known to provide complementary geometric signals that can improve robustness, most existing approaches assume access to ground-truth depth or rely on complex RGB-D architectures, limiting their applicability in practice.

In this work, we investigate whether estimated depth, obtained from a monocular RGB image, can serve as a simple and effective auxiliary signal to improve out-of-distribution (OOD) generalisation in standard CNN classifiers. Using both controlled toy experiments and real-world evaluations on the NICO++ benchmark, we compare RGB-only models against RGB-D variants that incorporate a single predicted depth channel via minimal fusion. Our results show that pseudo-depth consistently reduces OOD performance gaps across multiple CNN backbones, without degrading in-distribution accuracy. We further demonstrate that these gains persist under moderate corruption of the depth signal and disappear when geometric structure is entirely removed, indicating that the improvements stem from meaningful geometric information rather than the mere presence of an additional input channel. Furthermore, we analyse these effects through class-resolved confusion matrices and qualitative input-level examples, showing that depth specifically attenuates structured semantic confusions under domain shift.

Taken together, our findings suggest that even imperfect, predicted depth can act as a lightweight geometric inductive bias, helping CNN classifiers move away from brittle appearance-based shortcuts and toward more robust representations under domain shift.

https://gitlab.ewi.tudelft.nl/in5000/janvangemert/alexandraioana

Files

Master_Thesis.pdf
(pdf | 31.3 Mb)
License info not available