Recent advancements in RGB-only dense Simultaneous Localization and Mapping have predominantly focused on combining a dense scene representation, based on 3D Gaussian Splatting (3DGS), with a camera pose estimation and per-frame depth prediction module. Although these methods hav
...
Recent advancements in RGB-only dense Simultaneous Localization and Mapping have predominantly focused on combining a dense scene representation, based on 3D Gaussian Splatting (3DGS), with a camera pose estimation and per-frame depth prediction module. Although these methods have made progress in accurate camera tracking and photorealistic reconstruction quality, they still require large amounts of computational resources, which makes them unsuitable for resource-constrained applications. To this end, we propose dual-scene representation with a novel camera pose optimization module that uses a sparse point-based scene representation, optimized using multi-view point tracks from a pre-trained network. We combine this camera tracker with a 3DGS-based dense scene representation to achieve accurate camera pose estimations and high-quality scene renderings with significantly lower GPU memory usage. We evaluate our method with quantitative and qualitative results on a synthetic and real-world dataset, achieving competitive performance with state-of-the-art GPU memory usage.