Self-Supervised Learning of Event-Based Optical Flow via Deep Equilibrium Models

Master thesis (2024)

Authors

A.K. Shokolarov Aerospace Engineering

Contributors

G.C.H.E. de Croon (graduation committee member)

Y. Wu (mentor)

Faculty

Aerospace Engineering, Aerospace Engineering

To reference this document use:

http://resolver.tudelft.nl/uuid:eb522c6b-1b1d-4988-8a7a-e2846dc697c5

More Info

expand_more

Published Date

29-05-2024

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Aerospace Engineering

Abstract

The estimation of optical flow, which determines the movement of objects in a visual scene, is a crucial problem in computer vision. It is essential for applications such as autonomous navigation, where precise motion estimation is critical for performance and safety.

Frame-based cameras capture sequences of still images at regular intervals, from which optical flow is traditionally extracted using optimization-based or learning-based methods. Recently, event-based cameras, which detect changes in pixel brightness asynchronously, have gained traction due to their high temporal resolution and robustness to motion blur, and many algorithms have been developed to estimate optical flow from this data. IDNet is a learning-based approach that achieves state-of-the-art performance. However, IDNet and similar models face two major challenges: they require labeled ground-truth data for training, which is scarce and difficult to collect, and they rely on recurrent neural networks (RNNs) with a fixed number of refinement iterations. This fixed iteration scheme does not adapt to scene complexity, limiting accuracy for complex flows and increasing computational effort for simpler patterns.

The aim of this project is to explore, implement, and evaluate potential methods to address these two mentioned limitations and enhance the capabilities of models like IDNet.

To remove the need for ground-truth data, a self-supervised learning paradigm was implemented by introducing a novel contrast maximization loss that assesses the blur present when accumulating raw events for a certain time interval and compensating it with the predicted flow. To assess the effectiveness of this method, models were trained on the benchmark MVSEC dataset, showing improved results over previous methods with up to 15% on some sequences and an 8% improvement on average. Based on these experiments and results, further research directions were proposed.

As for the problem of the current fixed iteration scheme, Deep Equilibrium Models were found to provide a promising pathway to solving it. These novel models reformulate their iterative structure into a root-finding problem and utilize traditional solvers to find a solution based on some tolerance, providing a trade-off between speed and accuracy. Moreover, they allow for direct differentiation through the network using only their final estimate, compared to previous methods that keep track of their state through all iterations, leading to an O(1) memory consumption. Implementing these and some additional ideas, the trained DEQ IDNet model reached competitive performance on the DSEC dataset while consuming 15% less memory. Yet, further work is needed to close the gap and achieve state-of-the-art performance.

Files

Aleksandar_Thesis_Report.pdf

(pdf | 14.5 Mb)