Bridging the world of 2D and 3D Computer Vision

Self-Supervised Cross-modality Feature Learning through 3D Gaussian Splatting

Bachelor Thesis (2024)
Author(s)

A. Simionescu (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

X. Zhang – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

M. Weinmann – Graduation committee member (TU Delft - Computer Graphics and Visualisation)

More Info
expand_more
Publication Year
2024
Language
English
Graduation Date
25-06-2024
Awarding Institution
Project
CSE3000 Research Project
Programme
Computer Science and Engineering
Downloads counter
162
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Current robotic perception systems utilize a variety of sensors to estimate and understand a robot's surroundings. This paper focuses on a novel data representation technique that makes use of a recent scene reconstruction algorithm, known as 3D Gaussian Splatting, to explicitly represent and reason about an environment using only a sparse set of camera views as input. To achieve this, I generate and analyze the first cross-modal dataset consisting of 3D Gaussians and views taken around ten household objects. I introduce the resulting 3D Gaussians and images to a self-supervised feature learning network, that learns robust 2D and 3D embedding representations, by optimizing for the cross-view and cross-modality correspondence pretext tasks. I experiment with several 3D Gaussian features as input to the model and two point sub-network backbones, and report results on the two pretext tasks. The learned features are subsequently fine-tuned for the 2D and 3D shape recognition tasks. Moreover, by leveraging the fast scene reconstruction capabilities of the algorithm, I propose the use of rendered views as a visual memory aid to support downstream robotic tasks. The proposed networks achieve comparable results to state-of-the-art methods for point and image processing. The code associated to this paper is available at https://github.com/SimiOy/Self-Supervised-Learning-for-3DGS.

Files

License info not available