SLAM from RGB Image Sequences based on 3D Gaussian Splatting
A. Tibenský (TU Delft - Electrical Engineering, Mathematics and Computer Science)
Michael Weinmann – Mentor (TU Delft - Computer Graphics and Visualisation)
N. Tomen – Graduation committee member (TU Delft - Pattern Recognition and Bioinformatics)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Recent advancements in RGB-only dense Simultaneous Localization and Mapping have predominantly focused on combining a dense scene representation, based on 3D Gaussian Splatting (3DGS), with a camera pose estimation and per-frame depth prediction module. Although these methods have made progress in accurate camera tracking and photorealistic reconstruction quality, they still require large amounts of computational resources, which makes them unsuitable for resource-constrained applications. To this end, we propose dual-scene representation with a novel camera pose optimization module that uses a sparse point-based scene representation, optimized using multi-view point tracks from a pre-trained network. We combine this camera tracker with a 3DGS-based dense scene representation to achieve accurate camera pose estimations and high-quality scene renderings with significantly lower GPU memory usage. We evaluate our method with quantitative and qualitative results on a synthetic and real-world dataset, achieving competitive performance with state-of-the-art GPU memory usage.