SPICE: Self-supervised Predictive Coding of Events

None, None

SPICE: Self-supervised Predictive Coding of Events

Master Thesis (2025)

Author(s)

T.N.A. den Blanken (TU Delft - Mechanical Engineering)

Contributor(s)

Y. Wu – Mentor (TU Delft - Control & Simulation)

G.C.H.E. de Croon – Mentor (TU Delft - Control & Simulation)

Holger Caesar – Graduation committee member (TU Delft - Intelligent Vehicles)

C. de Wagter – Graduation committee member (TU Delft - Control & Simulation)

L. Ferranti – Graduation committee member (TU Delft - Learning & Autonomous Control)

Faculty

Mechanical Engineering

Semantic Segmentation Depth Estimation SSL Recurrent Neural Networks Contrastive Learning Self-supervised Learning Event Camera Optical Flow Estimation Predictive Coding Future Prediction

To reference this document use:

https://resolver.tudelft.nl/uuid:788c2e55-601b-45ae-aa53-8ea45001a3c8

More Info

expand_more

Publication Year

2025

Language

English

Graduation Date

18-12-2025

Awarding Institution

Delft University of Technology

Programme

['Mechanical Engineering | Vehicle Engineering | Cognitive Robotics']

Faculty

Mechanical Engineering

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Event-based cameras provide high temporal resolution, robustness to lighting conditions and low power consumption, but their sparse, temporal data require models that reason over time. In supervised settings, this is increasingly handled with recurrent architectures. In contrast, most self-supervised learning (SSL) methods still adapt non-recurrent RGB techniques, with masking-based objectives that favor spatial reconstruction over temporal understanding. We introduce SPICE: Self-supervised Predictive Coding on Events, an SSL framework tailored to event data that processes longer sequences recurrently and learns by predicting future latent representations rather than reconstructing masked inputs, promoting a more natural objective focyused on anticipating what comes next. SPICE further incorporates an event-specific contrastive loss only operating on active regions. SPICE pre-training improves downstream performance on semantic segmentation, depth estimation and optical flow estimation. Low-dimensional projections confirm that the learned representations are meaningful and avoid collapse, while also revealing limitations in temporal stability and semantic organization, indicating clear directions for future event-specific SSL research. Code is available upon request.

Files

MSc_Thesis_Tim_den_Blanken_Fin... (pdf)

(pdf | 34.7 Mb)

License info not available