3D building model edit with generative AI

More Info
expand_more

Abstract

Generative AI is developing rapidly and has shown great potential in generating and editing images with text prompts. It has achieved partial success in the challenging 3D model edits of common objects. However, there is a lack of attention in the building domain, which already faces certain limits in the 2D space. 3D building models automatic edit has wide application potential, especially for renovation, concept comparison, and large-scale scenes. Therefore, the thesis aims to edit the building models effectively through texture creation and geometric changes with generative AI by evaluating the existing 3D edit pipelines and putting forward practical solutions based on them. The thesis first examines the current methods. Latent-paint, Text2Tex and X-Mesh are chosen as the representative existing explicit representation (mesh) based pipelines for their relatively satisfying performance in common objects and the different ideas contained. Modifications are made based on them to improve output quality from the aspects of the control module and edit module. For the first aspect, attempts include adding the view specification text prompt to the original X-Mesh and using the image as the control for X-Mesh. Image control X-Mesh is also further experimented with placing higher attention on the input image view and editing the key semantic parts of the building separately. For the latter aspect, modifications include freezing sampled vertices geometry and combining X-Mesh and Text2Tex. Results show that Latent-Paint mainly creates texture with only major colour and lacks details. Text2Tex generally generates fine textures for inputs with richer depth information. X-Mesh creates textures and edits geometry jointly. It can improve the fidelity of the input mesh to some extent but suffers from noise problems. Image control X-Mesh generates more realistic results and outperforms the text control pipeline. Combining Text2Tex with X-Mesh takes advantage of Text2Tex, which can generate smoother and more realistic textures compared to X-Mesh in some cases. The combination is especially recommended for low-fidelity mesh of small or medium size. The user study also shows that the combination of X-Mesh and Text2Tex generates the most favoured results while image control X-Mesh ranks second. There are still limitations in geometric deformation scope, fidelity, generalizability and computational efficiency in the proposed 3D edit pipelines. The thesis succeeds in proving the application potential of 3D edit pipelines in the building domain and obtaining high-quality results by modifying existing methods.