Human-centric quality assessment and visual attention modeling for point clouds

None, None

doi:10.4233/uuid:3b9f08a9-8adc-40b2-9fff-e193129d1cc0

Human-centric quality assessment and visual attention modeling for point clouds

Doctoral Thesis (2026)

Author(s)

X. Zhou (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

P.S. Cesar Garcia – Promotor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Irene Viola – Copromotor (Centrum Wiskunde & Informatica (CWI))

Research Group

Multimedia Computing

Virtual Reality Point Cloud Visual Saliency Quality Assesment

DOI related publication

https://doi.org/10.4233/uuid:3b9f08a9-8adc-40b2-9fff-e193129d1cc0 Final published version

To reference this document use

https://doi.org/10.4233/uuid:3b9f08a9-8adc-40b2-9fff-e193129d1cc0

More Info

expand_more

Publication Year

2026

Language

English

Defense Date

04-03-2026

Awarding Institution

Delft University of Technology

Research Group

Multimedia Computing

ISBN (print)

978-94-6518-263-6

Downloads counter

49

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Human-Centric Quality Assessment and Visual Attention Modeling for Point Clouds addresses a central challenge in immersive three-dimensional media: how to measure, understand, and predict human perceptual experience when interacting with high-fidelity point-cloud content in virtual and mixed-reality environments. Point clouds are increasingly adopted in extended reality (XR), telepresence, free-viewpoint video, and autonomous systems due to their flexibility in representing complex 3D scenes. However, their irregular structure, large data volume, and the coupled effects of geometric and texture distortions make perceptual quality assessment particularly challenging. Widely used image and video quality assessment metrics are known to correlate poorly with human Mean Opinion Scores (MOS) when directly applied to point clouds, highlighting the need for perceptually grounded quality models tailored to this media type.

This dissertation develops methodologies, datasets, and objective metrics that bridge signal processing techniques with human-centred evaluation in order to improve perceptual alignment in Point Cloud Quality Assessment (PCQA). Beyond metric design, the thesis also provides a human-centred experimental framework and prototype system to support subjective quality assessment and visual saliency analysis in immersive environments, enabling controlled yet valid perceptual studies.

The thesis is organized into three complementary parts. The first part focuses on objective PCQA. Novel quality metrics are proposed to capture the combined influence of geometry and texture on perceived quality. A full-reference metric, PointPCA+, and a no-reference model, M3-Unity, are introduced. These approaches employ modalityaware feature representations that explicitly account for the characteristics of point-cloud geometry and color attributes, together with carefully designed similarity measures or learning-based regression models. Extensive evaluations on public benchmark datasets demonstrate that the proposed metrics achieve improved correlation with human MOS compared to state-of-the-art methods, while also exhibiting robustness across different compression schemes and distortion types. These results confirm the importance of modality-specific modeling and perceptually motivated feature design for PCQA.

The second part of the thesis investigates human visual attention for dynamic pointcloud content in immersive environments. Eye-tracking experiments were conducted in six-degrees-of-freedom virtual reality settings to capture gaze behavior during the viewing of dynamic point clouds. Two datasets were constructed: a task-dependent dataset (QAVA-DPC), in which participants performed explicit visual tasks, and a task-free dataset (TF-DPC), designed to capture natural viewing behavior. The experimental design incorporates systematic preprocessing, stimulus normalization, and error profiling of headmounted display eye trackers to ensure data reliability and reproducibility. These datasets enable a detailed analysis of how task demands, motion, and temporal dynamics influence visual saliency and viewing behavior in immersive 3D scenes.

The third part explores the integration of visual saliency into objective quality assessment. By incorporating both ground-truth and predicted saliency maps into PCQA pipelines, the thesis examines how attention-guided feature weighting and perceptual pooling strategies affect quality prediction performance. Experimental results show that saliency-aware approaches can improve predictive accuracy, although the magnitude of improvement depends on the underlying quality metric and pooling strategy. These findings highlight that visual attention is a valuable perceptual cue, but must be integrated in a principled and task-aware manner. Overall, the results demonstrate that attention-aware models can better prioritize perceptually relevant distortions, which is particularly beneficial for applications such as point-cloud compression, adaptive streaming, and immersive media delivery.

In addition to methodological advances, this dissertation contributes a range of resources that support reproducible research and open science for the research community. The gaze-annotated datasets for dynamic point clouds are made available to support reproducible research and facilitate further investigation of visual attention in immersive media. The thesis also documents detailed experimental protocols inspired by ITU recommendations and adapted to extended-reality settings, contributing to ongoing efforts toward standardization and open science in volumetric media research.

Files

TU_Delft_dissertation_Xuemei.p... (pdf)

(pdf | 11.4 Mb)

License info not available