Visual-Saliency Guided Multi-modal Learning for No Reference Point Cloud Quality Assessment

None, None; None, None; None, None; None, None

Visual-Saliency Guided Multi-modal Learning for No Reference Point Cloud Quality Assessment

Conference Paper (2024)

Author(s)

X. Zhou (TU Delft - Multimedia Computing, Centrum Wiskunde & Informatica (CWI))

Irene Viola (Centrum Wiskunde & Informatica (CWI))

Ruihong Yin (Universiteit van Amsterdam)

Pablo Cesar (TU Delft - Multimedia Computing, Centrum Wiskunde & Informatica (CWI))

Multimedia Computing

DOI related publication

https://doi.org/10.1145/3689093.3689183

Projection Multi-modal No reference Point cloud quality assessment Visual saliency

To reference this document use:

https://resolver.tudelft.nl/uuid:6e9a8a72-627c-4cd0-8cac-a28322d57392

More Info

expand_more

Publication Year

2024

Language

English

Multimedia Computing

Pages (from-to)

39-47

ISBN (electronic)

979-8-4007-1204-3

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

As 3D immersive media continues to gain prominence, Point Cloud Quality Assessment (PCQA) is essential for ensuring high-quality user experiences. This paper introduces ViSam-PCQA, a no-reference PCQA metric guided by visual saliency information across three modalities, which facilitates the performance of the quality prediction. Firstly, we project the 3D point cloud to acquire 2D texture, depth, and normal maps. Secondly, we extract the saliency map based on the texture map and refine it with the corresponding depth map. This refined saliency map is used to weight low-level feature maps to highlight perceptually important areas in the texture channel. Thirdly, high-level features from the texture, normal, and depth maps are then processed by a Transformer to capture global and local point cloud representations across the three modalities. Lastly, saliency along with global and local embeddings, are concatenated and processed through a multi-task decoder to derive the final quality scores. Our experiments on the SJTU, WPC, and BASICS datasets show high Spearman rank order correlation coefficients/Pearson linear correlation coefficients of 0.953/0.962, 0.920/0.920 and 0.887/0.936 respectively, demonstrating superior performance compared to current state-of-the-art methods.

Files

3689093.3689183.pdf

(pdf | 1.68 Mb)

- Embargo expired in 28-04-2025

License info not available