More Robust Visual Place Recognition with Image-to-Image Augmentations from Vision Foundation Models

None, None

More Robust Visual Place Recognition with Image-to-Image Augmentations from Vision Foundation Models

Master Thesis (2025)

Author(s)

F. Gebben (TU Delft - Mechanical Engineering)

Contributor(s)

J.F.P. Kooij – Mentor (TU Delft - Intelligent Vehicles)

M. Zaffar – Mentor (TU Delft - Intelligent Vehicles)

S Khademi – Graduation committee member (TU Delft - Building Knowledge)

Faculty

Mechanical Engineering

Data Augmentation Visual Place Recognition Vision Foundation Model

To reference this document use:

https://resolver.tudelft.nl/uuid:ba34e61f-e10f-4cf0-9ca6-771497dff27c

More Info

expand_more

Publication Year

2025

Language

English

Graduation Date

06-06-2025

Awarding Institution

Delft University of Technology

Programme

['Mechanical Engineering | Vehicle Engineering | Cognitive Robotics']

Faculty

Mechanical Engineering

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Visual Place Recognition (VPR) remains a challenging problem, particularly under difficult conditions such as night-time or winter weather, which are often underrepresented in existing training datasets. Although transformer-based models have recently advanced the state-of-the-art, their high computational demands can hinder deployment in real-world robotic systems. This thesis proposes a new data augmentation strategy for VPR using image-to-image Vision Foundation Model InstructPix2Pix to generate realistic visual variations such as night and snow scenes from the original training data. These synthetic augmentations are added to the original training dataset to extend dataset diversity without requiring additional data collection. To further improve performance, the method is combined with more advanced augmentations using the Kornia library, which already improves robustness over traditional augmentation techniques. Experiments on multiple benchmark datasets show that lightweight, ResNet-based models trained with our VFM augmentations achieve significantly improved performance under challenging visual conditions. Additional ablations demonstrate the importance of careful prompt design and hyperparameter tuning. Overall, this work shows that VFMs can serve as practical tools for targeted dataset augmentation, improving the robustness of existing VPR methods in difficult scenarios.

Files

Master_Thesis_Fabian_Gebben.pd... (pdf)

(pdf | 15.3 Mb)

License info not available