More Robust Visual Place Recognition with Image-to-Image Augmentations from Vision Foundation Models

Master Thesis (2025)
Author(s)

F. Gebben (TU Delft - Mechanical Engineering)

Contributor(s)

J.F.P. Kooij – Mentor (TU Delft - Intelligent Vehicles)

M. Zaffar – Mentor (TU Delft - Intelligent Vehicles)

S. Khademi – Graduation committee member (TU Delft - Building Knowledge)

Faculty
Mechanical Engineering
More Info
expand_more
Publication Year
2025
Language
English
Graduation Date
06-06-2025
Awarding Institution
Delft University of Technology
Programme
['Mechanical Engineering | Vehicle Engineering | Cognitive Robotics']
Faculty
Mechanical Engineering
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Visual Place Recognition (VPR) remains a challenging problem, particularly under difficult conditions such as night-time or winter weather, which are often underrepresented in existing training datasets. Although transformer-based models have recently advanced the state-of-the-art, their high computational demands can hinder deployment in real-world robotic systems. This thesis proposes a new data augmentation strategy for VPR using image-to-image Vision Foundation Model InstructPix2Pix to generate realistic visual variations such as night and snow scenes from the original training data. These synthetic augmentations are added to the original training dataset to extend dataset diversity without requiring additional data collection. To further improve performance, the method is combined with more advanced augmentations using the Kornia library, which already improves robustness over traditional augmentation techniques. Experiments on multiple benchmark datasets show that lightweight, ResNet-based models trained with our VFM augmentations achieve significantly improved performance under challenging visual conditions. Additional ablations demonstrate the importance of careful prompt design and hyperparameter tuning. Overall, this work shows that VFMs can serve as practical tools for targeted dataset augmentation, improving the robustness of existing VPR methods in difficult scenarios.

Files

License info not available