Hesam Araghi | TU Delft Repository

The Role of Receptive Fields in Graph Neural Networks for Spatially-Aware Tasks

Master thesis (2026) - V. Srinidhi, J.C. van Gemert, Hesam Araghi, M. Khosla

In a Convolutional Neural Network (CNN) the receptive field is a region of the input image that a kernel aggregates features from. Successive layers in a CNN expand the receptive field of a kernel by a fixed amount, and this growth corresponds to a consistent metric region across the entire input. The receptive field in a Graph Neural Network (GNN) depends on the graph topology, which changes based on the method of construction. k-Nearest Neighbor (k-NN) construction produces edges whose physical length varies with local point density to keep node degree constant, leading to non-uniform growth of the receptive field. We hypothesize that this inconsistency degrades GNN accuracy on tasks where metric distances carry meaning, and that radius-based graph construction is preferable, since fixing a distance threshold makes the receptive field grow
uniformly in metric space with each layer. We test both construction methods under the sparse, non-uniform sampling conditions typical of real-world point clouds. To enable these comparisons, we introduce PointMNIST, a 2D point cloud control dataset designed to isolate and empirically expose the fundamental differences between radius and kNN graph construction. We further show that augmenting point clouds with regularly distributed background points, a form of spatial “padding”, partially compensates for the distortions of k-NN construction, and restores the connec- tivity that radius graphs lose on sparse point clouds. ...

Learning Event Representations for Vision Foundation Models for Monocular Depth Estimation

Master thesis (2026) - M. Jiang, H. Araghi, N. Tömen, M. Weinmann

Event cameras are a novel sensing modality, but the lack of densely annotated datasets remains a major limitation for tasks such as monocular depth estimation. To address this, we investigate how Vision Foundation Models (VFMs), trained on large-scale RGB datasets, can be leveraged for event-based depth estimation. Previous work combines handcrafted event representations with fine-tuning of VFMs to adapt them to the event domain. In contrast, we learn an event representation while keeping the VFM frozen. We evaluate two representation learners, a U-Net and a Fully Convolutional (FullyConv) model, on DSEC and MVSEC. The results show that learned event representations are highly effective in-domain: both models outperform all baselines on DSEC, including Depth AnyEvent (DAE) and direct RGB input to Depth Anything V2 (DAv2), while the FullyConv model remains competitive on MVSEC. Cross-dataset experiments show that this improvement does not consistently transfer under domain shift. These findings indicate that learning the input representation is a strong strategy for in-domain event-based depth estimation, but that representation learning alone is not sufficient to guarantee robust cross-dataset generalization. ...

A Survey on Event Camera Simulators and Datasets for Optical Flow Estimation

Bachelor thesis (2024) - O. Hageman, N. Tömen, Hesam Araghi, G. Lan

Computer vision tasks have shown to benefit greatly from both developments in deep learning networks, and the emergence of event cameras. Deep networks can require a large amount of training data, which is not readily available for event cameras, specifically for optical flow estimation. The need for simulating this data in a realistic, physics-driven manner is therefore crucial. This paper compares the state of the art event camera simulators on different criteria, including event timestamp modeling, performance under low illumination, bandwidth simulation, computation speed and various types of noise simulation. We also summarize the shortcomings of some commonly used optical flow event datasets. For generating high-quality, realistic events, The V2E and DVS-Voltmeter simulators have shown to produce the most accurate data. ...

E-GMFlow: Time granularity for transformer architectures in event-based optical flow

Bachelor thesis (2024) - A. Badiu, Hesam Araghi, N. Tömen, G. Lan

Event cameras are bio-inspired sensors with high dynamic range, high temporal resolution, and low power consumption. These features enable precise motion detection even in challenging lighting conditions and fast-changing scenes, rendering them well-suited for optical flow estimation. However, event camera output is sparse and unstructured, making it challenging to process. Transformer architectures have shown to be effective in capturing long-term temporal dependencies and processing sparse input, hence they might be better suited to processing this output by leveraging the fine time granularity inherent to event camera data.
We introduce E-GMFlow, an approach for event-based optical flow inspired by the recent success in terms of accuracy of transformer-based models for frame-based optical flow. We explore the effect of temporal details on the accuracy of this transformer architecture by changing the number of temporal bins in which events are discretized. We observe that the increase in the number of temporal bins generally causes higher accuracy and comment on the limitations of this study. ...

A Comparative Study of Model-based and Learning-based Optical Flow Estimation methods with Event Cameras

Bachelor thesis (2024) - D. Dinucu-Jianu, Hesam Araghi, Nergis Tömen, Guohao Lan

Optical flow estimation with event cameras encompasses two primary algorithm classes: model-based and learning-based methods. Model-based approaches, do not require any training data while learning-based approaches utilize datasets of events to train neural networks. To effectively apply these algorithms, it's essential to understand their respective strengths and weaknesses.
This study compares model-based and learning-based optical flow estimation methods using event cameras, aiming to provide guidance for real-world applications. We evaluated these methods on the MVSEC and DSEC datasets, focusing on their accuracy and runtime. Our findings indicate that model-based methods excel on the MVSEC dataset, characterized by small motions, while learning-based approaches perform better on the more dynamic DSEC dataset. To investigate potential overfitting of learning-based methods to DSEC, we retrained the IDNet and TMA models on the BlinkFlow dataset. The retrained models demonstrated competitive accuracy, surpassing model-based methods which indicates that learning-based models perform better on datasets like DSEC even when not able to overfit. Finally, our analysis on runtime showed that model-based methods achieve real-time performance on CPUs and learning-based methods require a GPU to run in real-time. ...

Optical Flow Estimation Using Event-Based Cameras

Improving Optical Flow Estimation Accuracy Using Space-Aware De-Flickering

Bachelor thesis (2024) - P.M. Skullerud, N. Tömen, H. Araghi, G. Lan

Event cameras are novel sensors whose high temporal resolution and bandwidth motivate their use for the optical flow estimation problem. However, the properties of event cameras also introduce a vulnerability to flickering. Flickering hurts the perceptibility of motion by overwhelming event data with unrelated information. The single existing event de-flicker method (EFR) is built for scenarios where the relative position of the camera and the flickering object is constant, which is uncommon in motion-heavy optical flow estimation scenarios. Our contribution is a new de-flickering method that incorporates spatial awareness of nearby pixels. We hypothesize this feature to increase robustness to movement, and thus to better improve optical flow accuracy. Compared to EFR our method falters at filtering intensely flickering surfaces, but better preserves the spatial coherence of edges. However, we observe that both de-flickering methods remove much geometric information, especially given slow motion or weak ambient illumination. Our benchmarking shows that neither our method nor EFR significantly affects optical flow estimation accuracy, despite reducing event counts by 50-65%. Overall, we conclude that the niche benefits of spatial filtering are nullified by the result that filtering hardly affects optical flow estimation. ...

Unsupervised optical flow estimation of event cameras

The influence of training sets on model performance

Bachelor thesis (2024) - M. van den Berg, Hesam Araghi, N. Tömen, G. Lan

Event cameras are cameras that capture events asynchronously based on changes in lighting. They offer multiple benifits, but pose challenges in computer vision due to their asynchronous nature and hard to capture ground truth values to compare against. This paper shows the effects training of a state of the art unsupervised learning algorithm Taming Contrast Maximisation for predicting optical flow on a new dataset BlinkFlow which promises improvements in performance of supervised algorithms. This paper aims to see if these improved performances also happen for unsupervised models. Results of this research were inconclusive for the effectiveness of training unsupervised models, but it was shown that pretrained models on DSEC and MVSEC datasets did not perform well on this new dataset. ...