Semantic Supervision and Representation Design in 3D Gaussian Splatting for Urban Scene Understanding
H.E. Chassagnette (TU Delft - Mechanical Engineering)
Holger Caesar – Mentor (TU Delft - Intelligent Vehicles)
M. Weinmann – Graduation committee member (TU Delft - Computer Graphics and Visualisation)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
3D Gaussian Splatting (3DGS) has recently emerged as an efficient method for high-fidelity scene reconstruction in autonomous driving environments. While semantic information has been incorporated into Gaussian based representations for scene understanding tasks, it's broader potential for influencing the training process remains unexplored.
This thesis investigates how semantic supervision can be integrated into 3DGS training through several semantic-aware strategies, including alternative semantic loss functions, weighting schemes, and semantic-guided densification mechanisms. In addition, we explore different ways of organising RGB and semantic information within the representation. Since RGB appearance and semantic information differ in complexity, we compare a joint Gaussian representation, where RGB and semantic supervision act on the same primitives, with a separated Gaussian representation, where semantic information is modelled by an independent Gaussian set.
Experimental results show that the choice of semantic classification loss is the dominant factor influencing semantic performance, while auxiliary strategies do not provide significant improvements. Furthermore, we observe a clear trade-off between representation designs: the joint representation achieves stronger semantic performance but at the cost of degradation in RGB reconstruction quality, whereas the separated representations preserves RGB fidelity with minimal degradation while still achieving good semantic performance. These findings highlight the trade-offs between representations and motivate the exploration of hybrid organisations that better balance RGB reconstruction quality and semantic performance.