Like squinting your eyes: The impact of different fusion modules on change detection with deep learning

None, None

Like squinting your eyes: The impact of different fusion modules on change detection with deep learning

Bachelor Thesis (2024)

Author(s)

V. Dakov (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Dessislava Petrova-Antonova – Mentor (GATE Institute, Sofia University St. Kliment Ohridski)

Jan Van Gemert – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

K Hildebrandt – Graduation committee member (TU Delft - Computer Graphics and Visualisation)

Faculty

Electrical Engineering, Mathematics and Computer Science

Segmentation Deep Learning Morphology Change detection Fusion Encoder-Decoder

To reference this document use:

https://resolver.tudelft.nl/uuid:261fdb0e-a6f2-498a-bc67-9622823d6142

More Info

expand_more

Publication Year

2024

Language

English

Graduation Date

24-06-2024

Awarding Institution

Delft University of Technology

Project

CSE3000 Research Project

Programme

Computer Science and Engineering

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Change detection with remote sensing data highlights se- mantic differences in an area between two or more time intervals. It involves the comparison of aerial photographs of the same location taken some time apart. This faci itates mass scale analysis of urban and rural data over time, including population trends, city expansion trends and illegal building detection. State-of-the-art methods for the task are predominantly deep learning networks, following an encoder-decoder architecture. These architectures all share the trait of having a ”fusion” point - a location in the network where inputs transition from being processed independently to becoming correlated. F sions can be classified in three categories: early, middle and late, depending on how deep within the network they occur. This study aims to show how changing the fusion impacts the size, spread and number of changes detected. It is motivated by how the receptive field of feature maps in convolutional neural networks expands in deeper layers, extracting features with different complexities. For this, four fusion architectures on three different datasets are compared: LEVIR-CD, HiUCD and a new, fully-controled dataset, CSCD. In terms of test accuracy and the changes’ size and spread, results are inconclusive. Which fusion achieves the highest performance varies per dataset. Possible reasons why include the complexity of remote sensing data and general differences between areas, but this is a subject of further study. The only conclusive category is the number of changes detected. On aver- age, all architectures overestimate the number of changes in a scene. When the accuracy of architectures is com- parable, however, early fusion overestimates the number of objects changed the most, while middle and late fusion give more realistic estimates. The case study has room for refinement in problem isolation, more data and extending the problem towards more architectures, but is a promising step towards understanding fusion.

Files

Like_Squinting_Your_Eyes_The_i... (pdf)

(pdf | 4.78 Mb)

License info not available