Recovering Visual Saliency from Intrinsic Properties of 3D Gaussian Splatting
X. Bi (TU Delft - Architecture and the Built Environment)
B.M. Meijers – Mentor (TU Delft - Architecture and the Built Environment)
L. Nan – Mentor (TU Delft - Architecture and the Built Environment)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
3D Gaussian Splatting (3DGS) represents scenes as collections of Gaussian primitives whose attributes are shaped by multi-view photographic supervision. This raises a natural question: does the photographer’s visual focus leave a measurable imprint on these intrinsic properties? While prior work has explored segmentation and scene decomposition in 3DGS, no existing method has investigated whether Gaussian attributes alone encode visual saliency. We propose a mask-free, post hoc classifier that recovers the photographer’s region of interest from Gaussian attributes, requiring neither the original training images nor any 2D foundation model. Trained on scenes from Tanks and Temples and MipNeRF360, our method achieves a mean LOOCV F1 of 0.957 and generalizes to unseen scenes with a mean test F1 of 0.929. Projected 3D saliency masks show strong alignment with U2-Net predictions on original training images, confirming that multi-view Gaussian intrinsic properties capture a geometrically consistent, view-stable notion of saliency that single-frame 2D methods cannot provide. These properties make our method applicable to automatic foreground extraction, capture intent analysis, and perceptual quality-driven compression for bandwidth-efficient streaming.