Exploiting the Test-time Reference Map for Visual Place Recognition
M. Zaffar (TU Delft - Aerospace Engineering)
J.F.P. Kooij – Promotor (TU Delft - Mechanical Engineering)
L. Nan – Copromotor (TU Delft - Architecture and the Built Environment)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Visual Place Recognition (VPR) is a key task in computer vision and robotics, enabling loop closure in SLAM, image-based localization, landmark retrieval, and navigation. While deep-learning approaches have improved robustness to viewpoint, illumination, seasonal, and dynamic changes, three less explored challenges—domain generalization, uncertainty estimation, and localization accuracy—remain critical. This thesis demonstrates that test-time reference maps, typically used only for retrieval, can be exploited to address these challenges without additional sensors or retraining.
A unified evaluation framework, VPR-Bench, is introduced to standardize datasets, metrics, and evaluation practices across robotics and vision communities. VPR-Bench enables meta-analyses of descriptor size, runtime trade-offs, viewpoint and illumination invariance, and retrieval efficiency, highlighting that no single VPR method is universally best.
To improve cross-domain robustness, Reference-Set Finetuning (RSF) is proposed: a self-supervised finetuning strategy using test-time reference images to reduce train-test domain gaps. For reliability, Spatial Uncertainty Estimation (SUE) leverages reference map metadata to quantify the spatial spread of top-ranked poses, outperforming lightweight methods and complementing geometric verification. Finally, Continuous Place-descriptor Regression (CoPR) densifies the feature space by regressing descriptors at novel poses, reducing localization errors caused by map quantization and enhancing accuracy when combined with viewpoint-variant encoders.
Overall, this thesis reframes the reference map from a passive database to an active, exploitable resource. By systematically leveraging map information through RSF, SUE, and CoPR, it delivers measurable improvements in robustness, reliability, and localization accuracy, advancing map-aware VPR for real-world robotics and autonomous systems.