On the Generalization of Metric Relative Pose Estimation Models to Unseen Environments

None, None

On the Generalization of Metric Relative Pose Estimation Models to Unseen Environments

Master Thesis (2025)

Author(s)

B. Jangley (TU Delft - Mechanical Engineering)

Contributor(s)

Julian F.P. Kooij – Mentor (TU Delft - Intelligent Vehicles)

Christian Pek – Graduation committee member (TU Delft - Robot Dynamics)

M. Zaffar – Graduation committee member (TU Delft - Intelligent Vehicles)

Faculty

Mechanical Engineering

Crowdsourced data Relative Pose Estimation 6D pose estimation Metric scale

To reference this document use:

https://resolver.tudelft.nl/uuid:f8b0899b-a921-4d6e-8593-9942a8388301

More Info

expand_more

Publication Year

2025

Language

English

Graduation Date

26-09-2025

Awarding Institution

Delft University of Technology

Programme

['Mechanical Engineering | Vehicle Engineering | Cognitive Robotics']

Faculty

Mechanical Engineering

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Abstract—Crowd-sourced imagery is increasingly important for urban mapping and visual localization. However, its reliability is limited by GPS inaccuracies and heterogeneous capture condi- tions, including device variability, viewpoint differences, illumi- nation changes, and temporal shifts. In these settings, achieving metric-scale pose estimation remains a central challenge. Deep Learning-based pose estimation models address this problem by learning to estimate the 6-DoF pose using geometric cues between image views and metric supervision during training on large datasets. This encourages spatial consistency and sup- ports generalization across diverse conditions. Recent learning- based architectures, often based on vision transformer encoders, approach the task through unified multi-task frameworks that jointly predict metric depthmaps and 2D–2D correspondences, with relative pose estimated downstream. This thesis evaluates whether such frameworks predict accurate metric depthmaps under domain shifts. Experiments show that, even with scale correction through data-driven fine-tuning with metric supervi- sion, depth predictions from multi-task relative pose estimation models fail to generalize reliably to out-of-domain environments. In contrast, monocular models, trained on significantly larger and more varied datasets, demonstrate strong zero-shot reliability for metric depth prediction. A hybrid pipeline is proposed that combines the geometric consistency of relative pose models with the stable metric cues of monocular models, enabling robust pose estimation in crowd-sourced outdoor environments.

Files

On_the_Generalization_of_Metri... (pdf)

(pdf | 54.2 Mb)

License info not available