Semantic Segmentation of RGB-Z Aerial Imagery Using Convolutional Neural Networks

None, None

Semantic Segmentation of RGB-Z Aerial Imagery Using Convolutional Neural Networks

Master Thesis (2020)

Author(s)

A.E. Mulder (TU Delft - Architecture and the Built Environment)

Contributor(s)

B. Dukai – Mentor (TU Delft - Urban Data Science)

R.Y. Peters – Graduation committee member (TU Delft - Urban Data Science)

J. Stoter – Graduation committee member (TU Delft - Urban Data Science)

Sven A. Briels – Coach

Jean-Michel Renders – Coach

Faculty

Architecture and the Built Environment

Copyright

Deep learning CNN DSM Remotely sensed data Semantic segmentation Topographic maps RGB-Z Aerial imagery

To reference this document use:

https://resolver.tudelft.nl/uuid:b936953b-4c73-4ce1-a897-7da4287ff79a

More Info

expand_more

Publication Year

2020

Language

English

Copyright

Graduation Date

24-06-2020

Awarding Institution

Delft University of Technology

Programme

Geomatics

Faculty

Architecture and the Built Environment

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Semantic segmentation (or pixel-level classification) of remotely sensed imagery has shown to be useful for applications in fields as mapping of land cover, object detection, change detection and land-use analysis. Deep learning algorithms called convolutional neural networks (CNNs) have shown to outperform traditional computer vision and machine learning approaches in tackling semantic segmentation tasks. Furthermore, addition of height information (Z) to aerial imagery (RGB) is believed to improve segmentation results. However, discussion remains on the following: to what extent height information adds value; the best way to combine RGB information with height information; and what type of height information can best be used. This study aims to answer these questions. In this research work, the CNN architectures FCN-8s, SegNet, U-Net and FuseNet-SF5 are trained to semantically segment 10 cm resolution true ortho imagery of Haarlem, potentially augmented with height information. The outputted topographic maps contain the classes building, road, water and other. Experiments are conducted that allow for the comparison of 1) models trained on RGB and on RGB-Z, 2) models combining RGB and height information through data fusion and through data stacking, and 3) models trained using different types of absolute and relative height approaches. Performances are compared based on scores on the performance measure (mean) intersection over union (IoU) and through visual assessment of outputted prediction maps. The results indicated that on average segmentation performance improves by approximately 1 percent when absolute height information is added. The class building showed to benefit the most from the addition of height information. Furthermore, extracting features from height information in a separate encoder and fusing these into RGB feature maps, led to a higher overall segmentation quality than when height information is provided as a stacked extra band and processed in the same encoder as the RGB information. Finally, models using relative height delivered a higher quality segmentation than when absolute height approaches were used, especially for large objects. The best performing model; FuseNet-SF5 trained on RGB imagery and pixel-level, relative height, retrieved a mean IoU of 0.8427 and IoUs of 0.8744, 0.7865, 0.9131 and 0.7966 for the classes building, road, water and other respectively. This model was able to correctly classify over 90% of the pixels of 67% of all the objects present in the ground truth. Overall, this study showed that, when considering semantic segmentation of aerial RGB imagery, 1) height information can improve segmentation results, 2) adding height information through data fusion can result in a higher segmentation quality than when data stacking is used, and 3) providing relative height to a network, rather than absolute height, can improve semantic segmentation quality.

Files

Thesis_SemanticSegmentation_Am... (pdf)

(pdf | 152 Mb)

License info not available

Presentation_SemanticSegmentat... (pdf)

(pdf | 14.4 Mb)

License info not available

GraduationPlan_SemanticSegment... (pdf)

(pdf | 3.98 Mb)

License info not available