Self-Supervised Monocular Visual Drone Model Identification through Improved Occlusion Handling

Conference Paper (2025)
Author(s)

Stavrow A. Bahnam (TU Delft - Control & Simulation)

Christophe De Wagter (TU Delft - Control & Simulation)

Guido C.H.E. De Croon (TU Delft - Control & Simulation)

DOI related publication
https://doi.org/10.1109/IROS60139.2025.11247627 Final published version
More Info
expand_more
Publication Year
2025
Language
English
Pages (from-to)
18977-18984
Publisher
IEEE
ISBN (electronic)
9798331543938
Event
Downloads counter
1
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Ego-Motion estimation is vital for drones when flying in GPS-denied environments. Vision-Based methods struggle when flight speed increases and close-by objects lead to difficult visual conditions with considerable motion blur and large occlusions. To tackle this, vision is typically complemented by state estimation filters that combine a drone model with inertial measurements. However, these drone models are currently learned in a supervised manner with ground-truth data from external motion capture systems, limiting scalability to different environments and drones. In this work, we propose a self-supervised learning scheme to train a neural-network-based drone model using only onboard monocular video and flight controller data (IMU and motor feedback). We achieve this by first training a self-supervised relative pose estimation model, which then serves as a teacher for the drone model. To allow this to work at high speed close to obstacles, we propose an improved occlusion handling method for training self-supervised pose estimation models. Due to this method, the root mean squared error of resulting odometry estimates is reduced by an average of 15%. Moreover, the student neural drone model can be successfully obtained from the onboard data. It even becomes more accurate at higher speeds compared to its teacher, the self-supervised vision-based model. We demonstrate the value of the neural drone model by integrating it into a traditional filter-based VIO system (ROVIO), resulting in superior odometry accuracy on aggressive 3D racing trajectories near obstacles. Self-Supervised learning of ego-motion estimation represents a significant step toward bridging the gap between flying in controlled, expensive lab environments and real-world drone applications. The fusion of vision and drone models will enable higher-speed flight and improve state estimation, on any drone in any environment.

Files

Taverne
warning

File under embargo until 01-06-2026