How do Transformer models perform in urban change detection with limited satellite datasets, and what strategies can enhance their accuracy for this task?
J.M. Bryczkowski (TU Delft - Electrical Engineering, Mathematics and Computer Science)
Jan van Van Gemert – Mentor (TU Delft - Pattern Recognition and Bioinformatics)
K. Hildebrandt – Graduation committee member (TU Delft - Computer Graphics and Visualisation)
Desislava Petrova-Antonova – Mentor (GATE Institute, Sofia University St. Kliment Ohridski)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
As global urbanization reaches an all-time high, effective urban management becomes a crucial factor for efficient development. Enhanced monitoring of these transformations leads to more informed decision-making by policymakers, emphasizing the importance of tracking these changes. One method for monitoring is Change Detection (CD), which involves comparing two satellite images captured at different times to detect changes over a period of time. CD involves numerous difficulties, such as data collection, varying weather conditions, limited availability of datasets, noise, illumination differences, and discrepancies in the equipment used for image capture. Convolutional Neural Networks (CNNs) can address these issues by delivering more effective models with better performance than non-deep learning models. However, the rise of Transformers has led researchers to develop networks based on Transformer architecture, yielding more promising results than CNNs when more data is available. This paper conducts an analysis of two existing Transformer-based models, emphasizing the challenges of handling CD with artificially small datasets. Using smaller datasets reduces the requirements for remote sensing capabilities of satellites, simulating the limitations encountered during data collection and processing. The models under examination are the Bitemporal Image Transformer (BIT) and the Visual change Transformer (VcT).