How do Transformer models perform in urban change detection with limited satellite datasets, and what strategies can enhance their accuracy for this task?

More Info
expand_more

Abstract

As global urbanization reaches an all-time high, effective urban management becomes a crucial factor for efficient development. Enhanced monitoring of these transformations leads to more informed decision-making by policymakers, emphasizing the importance of tracking these changes. One method for monitoring is Change Detection (CD), which involves comparing two satellite images captured at different times to detect changes over a period of time. CD involves numerous difficulties, such as data collection, varying weather conditions, limited availability of datasets, noise, illumination differences, and discrepancies in the equipment used for image capture. Convolutional Neural Networks (CNNs) can address these issues by delivering more effective models with better performance than non-deep learning models. However, the rise of Transformers has led researchers to develop networks based on Transformer architecture, yielding more promising results than CNNs when more data is available. This paper conducts an analysis of two existing Transformer-based models, emphasizing the challenges of handling CD with artificially small datasets. Using smaller datasets reduces the requirements for remote sensing capabilities of satellites, simulating the limitations encountered during data collection and processing. The models under examination are the Bitemporal Image Transformer (BIT) and the Visual change Transformer (VcT).