Convolutional Cross-View Pose Estimation

Journal Article (2024)
Author(s)

Z. Xia (TU Delft - Intelligent Vehicles)

O. Booij (TU Delft - Pattern Recognition and Bioinformatics)

J.F.P. Kooij (TU Delft - Intelligent Vehicles)

Research Group
Intelligent Vehicles
DOI related publication
https://doi.org/10.1109/TPAMI.2023.3346924
More Info
expand_more
Publication Year
2024
Language
English
Research Group
Intelligent Vehicles
Issue number
5
Volume number
46
Pages (from-to)
3813-3831
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

We propose a novel end-to-end method for cross-view pose estimation. Given a ground-level query image and an aerial image that covers the query's local neighborhood, the 3 Degrees-of-Freedom camera pose of the query is estimated by matching its image descriptor to descriptors of local regions within the aerial image. The orientation-aware descriptors are obtained by using a translationally equivariant convolutional ground image encoder and contrastive learning. The Localization Decoder produces a dense probability distribution in a coarse-to-fine manner with a novel Localization Matching Upsampling module. A smaller Orientation Decoder produces a vector field to condition the orientation estimate on the localization. Our method is validated on the VIGOR and KITTI datasets, where it surpasses the state-of-the-art baseline by 72% and 36% in median localization error for comparable orientation estimation accuracy. The predicted probability distribution can represent localization ambiguity, and enables rejecting possible erroneous predictions. Without re-training, the model can infer on ground images with different field of views and utilize orientation priors if available. On the Oxford RobotCar dataset, our method can reliably estimate the ego-vehicle's pose over time, achieving a median localization error under 1 meter and a median orientation error of around 1 degree at 14 FPS.

Files

Convolutional_Cross-View_Pose_... (pdf)
(pdf | 6.23 Mb)
- Embargo expired in 25-06-2024
License info not available