A.S. Gielisse | TU Delft Repository

AxFuncta - Fit instead of Predict for Accidental Scene Conditions

Master thesis (2026) - V.C.J.R. Rullens, J.C. van Gemert, A.S. Gielisse, R. Guerra Marroquim

In recent years, strong progress has been made in creating learnable affine-equivariant models for downstream tasks such as classification. However, these models encounter increased data requirements to represent all possible transformations due to greater task complexity, while having been shown to generalize poorly to out-of-distribution data. In this work, we introduce a test-time approach for generalizing to out-of-distribution data. Namely, by utilizing a network trained to reconstruct any image that is part of a standardized training distribution, our model can infer an affine transform that moves new samples in-distribution by minimizing their reconstruction loss. With this, this approach closely matches the work of Spatial Transformer Networks, which instead learn to transform data, and inverted neural renderers for pose estimation. Through experiments, we show that this method contains a strong level of out-of-distribution translation and scale invariance, as well as a small level of rotation invariance. Namely, we show that it can handle significant transformations beyond those produced by commonly used benchmarks such as AffNIST. Using this strength we show that this method excels especially in low data regimes, outperforming existing competitors. ...

Understanding the Value of Depth: RGB-D Fusion and Pseudo-Depth for Robust Out-of-Distribution Generalisation

An Experimental Journey into How Depth Shapes Generalisation in Vision Models

Master thesis (2026) - Alexandra Neagu, J.C. van Gemert, C.E. Brandt, A.S. Gielisse

Convolutional neural networks (CNNs) trained on RGB images (red, green, blue channels) often exhibit sharp performance degradation under distribution shifts, as they tend to rely on superficial appearance cues such as background or texture. While depth information is known to provide complementary geometric signals that can improve robustness, most existing approaches assume access to ground-truth depth or rely on complex RGB-D architectures, limiting their applicability in practice.

In this work, we investigate whether estimated depth, obtained from a monocular RGB image, can serve as a simple and effective auxiliary signal to improve out-of-distribution (OOD) generalisation in standard CNN classifiers. Using both controlled toy experiments and real-world evaluations on the NICO++ benchmark, we compare RGB-only models against RGB-D variants that incorporate a single predicted depth channel via minimal fusion. Our results show that pseudo-depth consistently reduces OOD performance gaps across multiple CNN backbones, without degrading in-distribution accuracy. We further demonstrate that these gains persist under moderate corruption of the depth signal and disappear when geometric structure is entirely removed, indicating that the improvements stem from meaningful geometric information rather than the mere presence of an additional input channel. Furthermore, we analyse these effects through class-resolved confusion matrices and qualitative input-level examples, showing that depth specifically attenuates structured semantic confusions under domain shift.

Taken together, our findings suggest that even imperfect, predicted depth can act as a lightweight geometric inductive bias, helping CNN classifiers move away from brittle appearance-based shortcuts and toward more robust representations under domain shift.

https://gitlab.ewi.tudelft.nl/in5000/janvangemert/alexandraioana ...

FlyingDutchman: An Optical Flow Analysis Tool

Master thesis (2025) - P.J.W. Reijalt, A.S. Gielisse, J.C. van Gemert

Much progress in optical flow research has been driven by benchmark datasets. However, these datasets provide only limited feedback on the underlying causes of architectural failures, typically restricted to metrics such as end-point error (EPE), occlusion statistics, and large-displacement ranges. This leads to imprecise claims regarding areas consecutive models have improved upon. In this paper, we present an analysis tool that enables the generation of customisable datasets, allowing controlled variation in displacement size, camera corruptions, luminance, and other factors. We demonstrate the utility of this tool by analysing the behaviour of different architectures under varying displacement sizes and in low-light settings. ...

Bridging the Gap: A Real-World Dataset and Evaluation of Optical Flow Models in Large Displacement Scenarios

Bachelor thesis (2025) - M. Timmerije, J.C. van Gemert, A.S. Gielisse, A. Voulimeneas

Optical flow models excel on synthetic benchmarks but can struggle with real-world scenarios involving large displacements, which are critical for applications like autonomous navigation and augmented reality. To address this, we introduce a novel real-world dataset and evaluation framework, using a specialized annotation tool to capture ground truth optical flow in scenarios with fast movements and close-range objects. Our approach minimizes confounders, providing clear insights into model performance with large displacements. Findings show recent models outperform the previous state-of-the-art, RAFT, across all tested scenarios. Both the annotation tool and dataset are available to support further research. ...

Going Against The Flow

Evaluating Optical Flow Estimation Models on Real-World Non-Rigid Motion

Bachelor thesis (2025) - S. Dahal, A.S. Gielisse, J.C. van Gemert, A. Voulimeneas

Optical flow estimation models are currently trained and evaluated on synthetic datasets. However, the generalizability of these models to real-world applications remains unexplored. This study investigates how well two state-of-the-art optical flow estimation models perform on real-world Articulated, Homothetic, and Conformal non-rigid motion. To facilitate evaluation, a manually annotated dataset comprising twenty-four real-world image pairs and sparse vector fields was created. Both models demonstrated performance consistent with synthetic benchmarks on Homothetic and Conformal motion. However, results degraded when evaluating Articulated motion, revealing limitations in real-world applicability for practical applications such as controlled robotics and object tracking. ...

Performance of Optical Flow Models on Real-World Occluded Regions

Bachelor thesis (2025) - I.A. Petre, J.C. van Gemert, A.S. Gielisse, A. Voulimeneas

Occlusions are one of the main challenges in optical flow estimation, where parts of the scene are no longer visible between consecutive frames. Several models address this problem, either intrinsically or explicitly, using different strategies. However, most benchmarks rely on synthetic data, and even real-world ones evaluate only overall model performance, without isolating occlusions. This work investigates optical flow model performance under real-world occlusions by introducing a manually annotated, occlusion-focused dataset. We present an annotation method tailored to three occlusion types: out-of-frame, inter-object, and self-occlusion. We then evaluate two models, FlowFormer++ and CCMR, which handle occlusions using different mechanisms. Our findings show that while CCMR demonstrates stronger overall performance, both models struggle with occluded regions, particularly self-occlusions involving rotation and perspective transformations. These results highlight the need for improved occlusion reasoning in models and more diverse real-world benchmarks. ...

Real-world evaluation of Optical Flow on repetitive patterns

Bachelor thesis (2025) - J.B. Klijnsma, J.C. van Gemert, A.S. Gielisse, A. Voulimeneas

Real-World Evaluation of Optical Flow with Varying Lighting Conditions

Bachelor thesis (2025) - Z. Ge, J.C. van Gemert, A.S. Gielisse, A. Voulimeneas

Optical flow estimation is a core task in computer vision, yet many existing models struggle with lighting-induced appearance changes that are common in real-world scenarios. This work presents a focused evaluation of recent deep learning-based optical flow models under controlled lighting variations, using a custom dataset composed of indoor and outdoor scenes recorded with a static camera. Scenarios include glare, moving shadows, intensity shifts, and outdoor shadows, with ground truth flow defined as zero to isolate the effect of illumination changes. Four models—RAFT, GMFlow, SEA-RAFT, and FlowDiffuser—are benchmarked using standard metrics (EPE and F1-all). The results reveal that even in the absence of physical motion, several models produce significant flow estimates, particularly under shadow and intensity variation. SEA-RAFT and RAFT show relatively higher robustness, while GMFlow and FlowDiffuser are more sensitive to lighting artifacts. The findings highlight a critical gap in current model generalization and emphasize the need for lighting-aware architectures and training strategies. ...

Representing CNN Feature Maps with Implicit Neural Representations

A Proof-of-Concept Study Using SIRENs

Master thesis (2025) - B.Y. He, J.C. van Gemert, A.S. Gielisse, K.A. Hildebrandt

High-resolution image analysis using deep Convolutional Neural Networks (CNNs) faces significant memory constraints due to the quadratic growth of intermediate feature maps with input resolution. This paper investigates whether Implicit Neural Representations (INRs), specifically SIRENs, can effectively represent CNN feature maps to reduce memory footprint during training. We address the unique challenge that CNN feature maps are not static signals but evolve continuously as network weights are updated through gradient-based optimization. Through three experiments on a modified All-CNN architecture trained on MNIST, we validate that: (1) SIRENs can fit static feature maps from frozen CNNs with high fidelity (PSNR > 30 dB) regardless of weight initialization; (2) SIRENs can track evolving feature maps during training, though with reduced reconstruction quality compared to static targets; and (3) SIREN-assisted feedforward—where SIRENs predict missing activations in receptive fields—enables classification accuracy (20.97%) above random guessing (10%) but substantially below standard training (95%). While results demonstrate the feasibility of using SIRENs to represent dynamic feature maps, significant challenges remain in maintaining reconstruction fidelity when SIRENs are integrated into the training loop. This proof-of-concept study provides empirical insights into bridging continuous implicit representations with discrete deep learning pipelines and highlights promising directions for future research in memory-efficient high-resolution image analysis. ...

Unreached potentials of RGB-D segmentation

Master thesis (2024) - P. Benschop, J.C. van Gemert, A.S. Gielisse, E. Eisemann

It is commonly believed that image recognition based on RGB improves when using RGB-D, ie: when depth information (distance from the camera) is added. Adding depth should make models more robust to appearance variations in colors and lighting; to recognize shape and spatial relationships while allowing models to ignore irrelevant backgrounds. In this paper we investigate how robust current RGB-D models truly are to changes in appearance, depth, and background where we vary one modality (RGB or depth) and compare RGB-D to RGB-only and depth-only in a semantic segmentation setting. Experiments show that all investigated RGB-D models show some robustness to variations in color, but might severely fail for unseen variations in lighting, spatial position and backgrounds. Our results show that we need new RGB-D models that can exploit the best of both modalities while remaining robust to changes in a single modality. ...

ARC: Anchored Representation Clouds for High-Resolution INR Classification

Master thesis (2024) - J.S. Luijmes, J.C. van Gemert, A.S. Gielisse

Implicit neural representations (INRs) exhibit exceptional compression and generalisation abilities that have enabled striking progress across a variety of applications. These properties have fuelled a growing interest in leveraging INRs for traditional classification tasks as a memory-efficient alternative representation of images, breaking the persistent link between image resolution and associated resource costs. Current INR classification methods face limitations such as a restriction to low-resolution data and sensitivity to image-space transformations. We attribute these issues to the employed INR architecture which lacks mechanisms for local representation, thereby disregarding spatial structure within the data and furthermore limiting their ability to capture high-frequency details. In this work, we propose ARC: Anchored Representation Clouds, a novel INR architecture that explicitly anchors latent vectors in image-space. By introducing spatial structure to the latent vectors, ARC can capture local image data which in our testing leads to state-of-the-art implicit image classification of both low- and high-resolution images and increased robustness against image-space translation. ...