SPICE: Self-supervised Predictive Coding of Events

Master Thesis (2025)
Author(s)

T.N.A. den Blanken (TU Delft - Mechanical Engineering)

Contributor(s)

Y. Wu – Mentor (TU Delft - Control & Simulation)

G.C.H.E. de Croon – Mentor (TU Delft - Control & Simulation)

Holger Caesar – Graduation committee member (TU Delft - Intelligent Vehicles)

C. de Wagter – Graduation committee member (TU Delft - Control & Simulation)

L. Ferranti – Graduation committee member (TU Delft - Learning & Autonomous Control)

Faculty
Mechanical Engineering
More Info
expand_more
Publication Year
2025
Language
English
Graduation Date
18-12-2025
Awarding Institution
Delft University of Technology
Programme
['Mechanical Engineering | Vehicle Engineering | Cognitive Robotics']
Faculty
Mechanical Engineering
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Event-based cameras provide high temporal resolution, robustness to lighting conditions and low power consumption, but their sparse, temporal data require models that reason over time. In supervised settings, this is increasingly handled with recurrent architectures. In contrast, most self-supervised learning (SSL) methods still adapt non-recurrent RGB techniques, with masking-based objectives that favor spatial reconstruction over temporal understanding. We introduce SPICE: Self-supervised Predictive Coding on Events, an SSL framework tailored to event data that processes longer sequences recurrently and learns by predicting future latent representations rather than reconstructing masked inputs, promoting a more natural objective focyused on anticipating what comes next. SPICE further incorporates an event-specific contrastive loss only operating on active regions. SPICE pre-training improves downstream performance on semantic segmentation, depth estimation and optical flow estimation. Low-dimensional projections confirm that the learned representations are meaningful and avoid collapse, while also revealing limitations in temporal stability and semantic organization, indicating clear directions for future event-specific SSL research. Code is available upon request.

Files

License info not available