Automated Semantic Segmentation of Aerial Imagery using Synthetic Data

Master thesis (2022)

Authors

C.A. Caceres Tocora Architecture and the Built Environment

Contributors

S. Du Urban Data Science - Architecture and the Built Environment (mentor)

J.E. Stoter Urban Data Science - Architecture and the Built Environment (graduation committee member)

Sven Briels (coach)

W. Gao Urban Data Science - Architecture and the Built Environment (coach)

Faculty

Architecture and the Built Environment, Architecture and the Built Environment

More Info

expand_more

To reference this document use:

http://resolver.tudelft.nl/uuid:677ad184-1169-45b3-a31b-bcb7239dd451

Published Date

15-06-2022

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Architecture and the Built Environment

Abstract

Semantic segmentation of aerial images is the ability to assign labels to all pixels of an image. It proves to be essential for various applications such as urban planning, agriculture and real-estate analysis. Deep Learning techniques have shown satisfactory results in performing semantic segmentation tasks. Training a deep learning model is an expensive operation, while most of the time manually labelled images are required. Additionally, a bottleneck in semantic segmentation projects concerns the annotation of images. Consequently, synthetic data, which consists of images from a virtual world that simulates the real world, can be used as training data for segmentation tasks to improve the classification results. Therefore, this thesis aims to create a pipeline that generates synthetic images with semantic segmentation labels to be used in an existing deep learning model and discuss how the generated synthetic data improves the semantic segmentation of aerial images. In this research work, an existing model (FuseNet), which in previous works achieved satisfactory results, is trained with solely synthetic data and a mix of real data in different training and testing scenarios to classify true ortho imagery from Haaksbergen, Netherlands and Potsdam, Germany. In addition, a benchmark of domain adaptation techniques is performed to close the domain gap between the synthetic and real imagery. The semantic maps include building, road and other classes. Experiments are performed to test the performance of the synthetic data using 1) Different 3D models of the virtual world, 2) Different quantities of synthetic and real training data, 3) Different cross-geographical scenarios, and 4) Different domain adaptation techniques. The assessment is based on the (mean) intersection over union (IoU), F1 score, precision and recall and an extensive visual assessment. The virtual world is created through a pipeline in CityEngine using procedural modelling techniques and then rendered in Blender to create the training dataset. The results show that the synthetic data has a mIoU of 0.48, which is lower compared to cases when solely real data (0.75) are used, when the segmentation is performed in the same training and testing area. In addition, the 3D models partly affect the segmentation results. When using a mix of real and synthetic data, the results are maintained to a mIoU of 0.75. On the contrary, when training and testing in different areas, the use of synthetic data seems to improve the results on average by 21.5, 12.5, 1.5 and 2 percentage points on the mIoU, IoU for classes building, road and other respectively. Additionally, domain adaptation techniques such as Cycle GAN and Cycada improve the performance of synthetic datasets by 4 percentage points. Overall, this thesis shows that when the domain difference between the training and testing datasets is big, the addition of the synthetic data helps to improve the performance of the semantic segmentation of aerial images. Synthetic datasets improve the segmentation results by using a mix of existing labelled imagery from different geographical regions when a project lacks labelled imagery. In contrast, when labelled imagery is present in the same testing area, the real training data obtains robust results, thus the addition of synthetic data does not improve the segmentation results.

Files

P5_Presentation_Camilo_Caceres... (.pdf)

(.pdf | 4.78 Mb)

P5_Report_Camilo_Caceres_53622... (.pdf)

(.pdf | 25.3 Mb)

P2_Report_Camilo_Caceres_53622... (.pdf)

(.pdf | 3.39 Mb)