Semantically-Guided 3D Building Facade Reconstruction: A Learning-Based MVS Approach

More Info
expand_more

Abstract

This thesis introduces a Learned-Based Multi-View Semantic Stereo method, addressing the limitations of traditional and learned-based Multi-View Stereo (MVS) techniques in reconstructing reflective and low-textured regions, particularly prevalent in 3D models of buildings. Traditional methods lack completeness, while learned-based methods struggle with accuracy. Focusing on enhancing 3D models of buildings, this research integrates semantic information into the existing deep learning architecture for depth prediction, specifically CasMVSNet, to guide the reconstruction process. Three key strategies are employed: first, the incorporation of semantic maps into the network through a multi-modal approach; second, the introduction of a multi-modal refinement module at the end of the CasMVSNet model to improve the initial output depth maps; and third, the introduction of two new loss terms designed to enforce varying degrees of smoothness on specific semantic categories. Experimental results, conducted on the DTU dataset, demonstrate a significant enhancement in accuracy at the point cloud level while maintaining the completeness of the reconstructed models. Validation and generalization on the ETH3D dataset show consistent patterns. This research showcases the potential of integrating semantic guidance in 3D reconstruction of buildings, advancing the field of computer vision.

Files