Deep Learning Fusion of Monocular and Stereo Depth Maps Using Convolutional Neural Networks

Master thesis (2024)

Authors

D. Tóth Aerospace Engineering

Contributors

G.C.H.E. de Croon (mentor)

Tom van Dijk (mentor)

C. de Wagter (graduation committee member)

N. Eleftheroglou Group Eleftheroglou - Aerospace Engineering (graduation committee member)

Faculty

Aerospace Engineering, Aerospace Engineering

Deep Learning CNN Computer Vision Depth Estimation

To reference this document use:

http://resolver.tudelft.nl/uuid:8649d62f-6266-44a4-89de-5b5805d83ae5

More Info

expand_more

Published Date

25-06-2024

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Aerospace Engineering

Abstract

This paper presents an encoder-decoder-style convolutional neural network (CNN) for the purpose of improving monocular and stereo depth estimation (SDE) estimates, by combining them with the corresponding monocular estimates through a fusion network, assisted by prior information to provide context for the fusion. Video cameras are commonly used for depth perception in robotics, especially weight-sensitive applications, such as on Micro Aerial Vehicles (MAV). The two primary paradigms for vision-based depth perception are monocular and stereo depth or disparity estimation, each having their own strengths and weaknesses. These strengths and weaknesses seem to be complementary, and thus a fusion of the two may result in more accurate predictions. In this paper, we investigate this fusion by training a CNN that combines stereo and monocular depth or disparity estimates. The fusion network is agnostic to the choice of the input networks, providing great flexibility. It was found that such a fusion network, while increasing the computational complexity of the depth perception pipeline, indeed improves the accuracy of the estimates. The number of outlier predictions has been significantly decreased, while also limiting some fundamental limitations of both stereo and monocular methods, such as errors arising from occluded regions.

Files

Deep_Learning_Fusion_of_Monocu... (pdf)

(pdf | 10.2 Mb)

Unknown license