Like squinting your eyes: The impact of different fusion modules on change detection with deep learning

Bachelor Thesis (2024)
Author(s)

V. Dakov (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Dessislava Petrova-Antonova – Mentor (GATE Institute, Sofia University St. Kliment Ohridski)

Jan van Van Gemert – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

K. Hildebrandt – Graduation committee member (TU Delft - Computer Graphics and Visualisation)

Faculty
Electrical Engineering, Mathematics and Computer Science
More Info
expand_more
Publication Year
2024
Language
English
Graduation Date
24-06-2024
Awarding Institution
Delft University of Technology
Project
CSE3000 Research Project
Programme
Computer Science and Engineering
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Change detection with remote sensing data highlights se- mantic differences in an area between two or more time intervals. It involves the comparison of aerial photographs of the same location taken some time apart. This faci itates mass scale analysis of urban and rural data over time, including population trends, city expansion trends and illegal building detection. State-of-the-art methods for the task are predominantly deep learning networks, following an encoder-decoder architecture. These architectures all share the trait of having a ”fusion” point - a location in the network where inputs transition from being processed independently to becoming correlated. F sions can be classified in three categories: early, middle and late, depending on how deep within the network they occur. This study aims to show how changing the fusion impacts the size, spread and number of changes detected. It is motivated by how the receptive field of feature maps in convolutional neural networks expands in deeper layers, extracting features with different complexities. For this, four fusion architectures on three different datasets are compared: LEVIR-CD, HiUCD and a new, fully-controled dataset, CSCD. In terms of test accuracy and the changes’ size and spread, results are inconclusive. Which fusion achieves the highest performance varies per dataset. Possible reasons why include the complexity of remote sensing data and general differences between areas, but this is a subject of further study. The only conclusive category is the number of changes detected. On aver- age, all architectures overestimate the number of changes in a scene. When the accuracy of architectures is com- parable, however, early fusion overestimates the number of objects changed the most, while middle and late fusion give more realistic estimates. The case study has room for refinement in problem isolation, more data and extending the problem towards more architectures, but is a promising step towards understanding fusion.

Files

License info not available