Recovering Visual Saliency from Intrinsic Properties of 3D Gaussian Splatting

None, None

Recovering Visual Saliency from Intrinsic Properties of 3D Gaussian Splatting

Master Thesis (2026)

Author(s)

X. Bi (TU Delft - Architecture and the Built Environment)

Contributor(s)

B.M. Meijers – Mentor (TU Delft - Architecture and the Built Environment)

L. Nan – Mentor (TU Delft - Architecture and the Built Environment)

Faculty

Architecture and the Built Environment

To reference this document use

https://resolver.tudelft.nl/uuid:d1148812-2a69-4d6b-b956-9560897e8b1d

More Info

expand_more

Publication Year

2026

Language

English

Graduation Date

17-06-2026

Awarding Institution

Delft University of Technology

Project

Geomatics for the Built Environment

Programme

Geomatics

Faculty

Architecture and the Built Environment

Downloads counter

46

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

3D Gaussian Splatting (3DGS) represents scenes as collections of Gaussian primitives whose attributes are shaped by multi-view photographic supervision. This raises a natural question: does the photographer’s visual focus leave a measurable imprint on these intrinsic properties? While prior work has explored segmentation and scene decomposition in 3DGS, no existing method has investigated whether Gaussian attributes alone encode visual saliency. We propose a mask-free, post hoc classifier that recovers the photographer’s region of interest from Gaussian attributes, requiring neither the original training images nor any 2D foundation model. Trained on scenes from Tanks and Temples and MipNeRF360, our method achieves a mean LOOCV F1 of 0.957 and generalizes to unseen scenes with a mean test F1 of 0.929. Projected 3D saliency masks show strong alignment with U2-Net predictions on original training images, confirming that multi-view Gaussian intrinsic properties capture a geometrically consistent, view-stable notion of saliency that single-frame 2D methods cannot provide. These properties make our method applicable to automatic foreground extraction, capture intent analysis, and perceptual quality-driven compression for bandwidth-efficient streaming.

Files

XinyaBi_Thesis_FinalVersion.pd... (pdf)

(pdf | 0.924 Mb)

License info not available