Making it Clear

None, None

Making it Clear

Using Vision Transformers in Multi-View Stereo on Specular and Transparent Materials

Master Thesis (2023)

Author(s)

W.E.P. Tolsma (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

N. Tömen – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

J.C. van Gemert – Graduation committee member (TU Delft - Pattern Recognition and Bioinformatics)

Faculty

Electrical Engineering, Mathematics and Computer Science

Computer vision Transformer Stereo Transparent Synthetic data MVS Multiview

To reference this document use:

https://resolver.tudelft.nl/uuid:2d9ad1e2-f553-44d6-98be-42b35fd37665

More Info

expand_more

Publication Year

2023

Language

English

Graduation Date

11-10-2023

Awarding Institution

Delft University of Technology

Programme

['Computer Science']

Abstract

Transparency and specularity are challenging phenomena that modern depth perception systems have to deal with in order to be used in practice. A promising family of depth estimation methods is Multi-View Stereo (MVS), which combines multiple RGB images to predict depth, thus circumventing the need for costly specialized hardware. Although promising, finding pixel-to-pixel mappings between images is a challenging task, clouded by ambiguity. In order to determine the current ability to deal with such ambiguity, we introduce ToteMVS: a multi-view, multi-material synthetic dataset with diffuse, specular and transparent objects. Recent works in computer vision have effectively replaced Convolutional Neural Networks (CNNs) with the emerging Vision Transformer (ViT) architecture, but it remains unclear whether ViTs outperform CNNs in handling reflective and transparent materials. In our study, we use ToteMVS to compare ViT- and CNN-based architectures on the ability to extract useful features for depth estimation on diffuse, specular, and transparent objects. Our results show that, in contrast with the current trend of using ViTs over CNNs, the ViT-based model does not have a special capability for dealing with these challenging materials in the context of MVS. Our evaluation data, including related code, can be found on our \href{https://github.com/pietertolsma/ToteMVS/}{GitHub}.

Files

Final_Thesis_Report_WEP_Tolsma... (pdf)

(pdf | 35 Mb)

License info not available