Human-centric quality assessment and visual attention modeling for point clouds

Doctoral Thesis (2026)
Author(s)

X. Zhou (TU Delft - Multimedia Computing)

Contributor(s)

P.S. Cesar Garcia – Promotor (TU Delft - Multimedia Computing)

Irene Viola – Copromotor (Centrum Wiskunde & Informatica (CWI))

Research Group
Multimedia Computing
More Info
expand_more
Publication Year
2026
Language
English
Defense Date
04-03-2026
Awarding Institution
Delft University of Technology
Research Group
Multimedia Computing
ISBN (print)
978-94-6518-263-6
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Human-Centric Quality Assessment and Visual Attention Modeling for Point Clouds addresses a central challenge in immersive three-dimensional media: how to measure, understand, and predict human perceptual experience when interacting with high-fidelity point-cloud content in virtual and mixed-reality environments. Point clouds are increasingly adopted in extended reality (XR), telepresence, free-viewpoint video, and autonomous systems due to their flexibility in representing complex 3D scenes. However, their irregular structure, large data volume, and the coupled effects of geometric and texture distortions make perceptual quality assessment particularly challenging. Widely used image and video quality assessment metrics are known to correlate poorly with human Mean Opinion Scores (MOS) when directly applied to point clouds, highlighting the need for perceptually grounded quality models tailored to this media type.

This dissertation develops methodologies, datasets, and objective metrics that bridge signal processing techniques with human-centred evaluation in order to improve perceptual alignment in Point Cloud Quality Assessment (PCQA). Beyond metric design, the thesis also provides a human-centred experimental framework and prototype system to support subjective quality assessment and visual saliency analysis in immersive environments, enabling controlled yet valid perceptual studies.

The thesis is organized into three complementary parts. The first part focuses on objective PCQA. Novel quality metrics are proposed to capture the combined influence of geometry and texture on perceived quality. A full-reference metric, PointPCA+, and a no-reference model, M3-Unity, are introduced. These approaches employ modalityaware feature representations that explicitly account for the characteristics of point-cloud geometry and color attributes, together with carefully designed similarity measures or learning-based regression models. Extensive evaluations on public benchmark datasets demonstrate that the proposed metrics achieve improved correlation with human MOS compared to state-of-the-art methods, while also exhibiting robustness across different compression schemes and distortion types. These results confirm the importance of modality-specific modeling and perceptually motivated feature design for PCQA.

The second part of the thesis investigates human visual attention for dynamic pointcloud content in immersive environments. Eye-tracking experiments were conducted in six-degrees-of-freedom virtual reality settings to capture gaze behavior during the viewing of dynamic point clouds. Two datasets were constructed: a task-dependent dataset (QAVA-DPC), in which participants performed explicit visual tasks, and a task-free dataset (TF-DPC), designed to capture natural viewing behavior. The experimental design incorporates systematic preprocessing, stimulus normalization, and error profiling of headmounted display eye trackers to ensure data reliability and reproducibility. These datasets enable a detailed analysis of how task demands, motion, and temporal dynamics influence visual saliency and viewing behavior in immersive 3D scenes.

The third part explores the integration of visual saliency into objective quality assessment. By incorporating both ground-truth and predicted saliency maps into PCQA pipelines, the thesis examines how attention-guided feature weighting and perceptual pooling strategies affect quality prediction performance. Experimental results show that saliency-aware approaches can improve predictive accuracy, although the magnitude of improvement depends on the underlying quality metric and pooling strategy. These findings highlight that visual attention is a valuable perceptual cue, but must be integrated in a principled and task-aware manner. Overall, the results demonstrate that attention-aware models can better prioritize perceptually relevant distortions, which is particularly beneficial for applications such as point-cloud compression, adaptive streaming, and immersive media delivery.

In addition to methodological advances, this dissertation contributes a range of resources that support reproducible research and open science for the research community. The gaze-annotated datasets for dynamic point clouds are made available to support reproducible research and facilitate further investigation of visual attention in immersive media. The thesis also documents detailed experimental protocols inspired by ITU recommendations and adapted to extended-reality settings, contributing to ongoing efforts toward standardization and open science in volumetric media research.

Files

License info not available