Semantic Segmentation using Deep Neural Networks for MAVs

None, None

Semantic Segmentation using Deep Neural Networks for MAVs

Master Thesis (2022)

Author(s)

T.V. Tran (TU Delft - Aerospace Engineering)

Contributor(s)

Guido C.H.E.de de Croon – Mentor (TU Delft - Control & Simulation)

Yingfu Xu – Mentor (TU Delft - Control & Simulation)

Christophe De de Wagter – Graduation committee member (TU Delft - Control & Simulation)

Jan van Gemert – Graduation committee member (TU Delft - Pattern Recognition and Bioinformatics)

Faculty

Aerospace Engineering

Copyright

Deep Learning Recurrent Neural Network Convolutional Neural Network Semantic Segmentation Optical Flow Micro Air Vehicle

To reference this document use:

https://resolver.tudelft.nl/uuid:7735d01c-b4cd-4173-a584-652f269c078c

More Info

expand_more

Publication Year

2022

Language

English

Copyright

Graduation Date

19-01-2022

Awarding Institution

Delft University of Technology

Programme

['Aerospace Engineering | Control & Simulation']

Faculty

Aerospace Engineering

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Semantic segmentation methods have been developed and applied to single images for object segmentation. However, for robotic applications such as high-speed agile Micro Air Vehicles (MAVs) in Autonomous Drone Racing (ADR), it is more interesting to consider temporal information as video sequences are correlated over time. In this work, we evaluate the performance of state-of-the-art methods such as Recurrent Neural Networks (RNNs), 3D Convolutional Neural Networks (CNNs), and optical flow for video semantic segmentation in terms of accuracy and inference speed on three datasets with different camera motion configurations. The results show that using an RNN with convolutional operators outperforms all methods and achieves a performance boost of 10.8% on the KITTI (MOTS) dataset with 3 degrees of freedom (DoF) motion and a small 0.6% improvement on the CyberZoo dataset with 6 DoF motion over the single-frame-based semantic segmentation method. The inference speed was measured on the CyberZoo dataset, achieving 321 fps on an NVIDIA GeForce RTX 2060 GPU and 30 fps on an NVIDIA Jetson TX2 mobile computer.

Files

Final_Thesis_Tran.pdf

(pdf | 40.4 Mb)

License info not available