Print Email Facebook Twitter SliceNet: Street-to-Satellite Image Metric Localization using Local Feature Matching Title SliceNet: Street-to-Satellite Image Metric Localization using Local Feature Matching Author de Vries Lentsch, Ted (TU Delft Mechanical, Maritime and Materials Engineering) Contributor Kooij, J.F.P. (mentor) Xia, Z. (mentor) Caesar, H.C. (graduation committee) Khademi, S. (graduation committee) Degree granting institution Delft University of Technology Programme Mechanical Engineering Date 2022-10-11 Abstract This work addresses visual localization for intelligent vehicles. The task of cross-view matching-based localization is to estimate the geo-location of a vehicle-mounted camera by matching the captured street view image with an overhead-view satellite map containing the vehicle's local surroundings. This local satellite view image can be obtained using any rough localization prior, e.g., from a global navigation satellite system or temporal filtering. Existing cross-view matching methods are global image descriptor-based and achieve considerably lower localization performance than structure-based methods with 3D maps. Whereas structure-based methods utilized global image descriptors in the past, recent structure-based work has shown that significantly better localization performance can be achieved using local image descriptors to find pixel-level correspondences between the query street view image and the 3D map. Hence, using local image descriptors may be the key to improving the localization performance of cross-view matching methods. However, the street and the satellite view do exhibit not only very different visual appearances but also have distinctive geometric configurations. As a result, finding correspondences between the two views is not a trivial task. We observe that the geometric relationship between the street and satellite view implies that every vertical line in the street view image has a corresponding azimuth direction in the satellite view image. Based on this prior, we devise a novel neural network architecture called SliceNet that extracts local image descriptors from both images and matches these to compute a dense spatial distribution for the camera's location. Specifically, the geometric prior is used as a weakly supervised signal to enable SliceNet to learn the correspondences between the two views. As an additional task, we also show that the extracted local image descriptors can be used to determine the heading of the camera. SliceNet outperforms global image descriptor-based cross-view matching methods and achieves state-of-the-art localization results on the VIGOR dataset. Notably, the proposed method reduces the median metric localization error by 21% and 4% compared to the state-of-the-art methods when generalizing, respectively, in the same area and across areas. Subject street-to-satellite image matchingcross-view matchingvehicle localizationSliceNetstreet camera localizationvisual localizationimage metric localizationimage matchinglocal feature matchingpose estimation To reference this document use: http://resolver.tudelft.nl/uuid:fd6af5cb-c8a7-4b54-8161-b34ede4cf2dd Part of collection Student theses Document type master thesis Rights © 2022 Ted de Vries Lentsch Files PDF Thesis_SliceNet_TedDeVrie ... _10_04.pdf 15.42 MB Close viewer /islandora/object/uuid:fd6af5cb-c8a7-4b54-8161-b34ede4cf2dd/datastream/OBJ/view