3D building model edit with generative AI

Master Thesis (2024)
Author(s)

Y. FENG (TU Delft - Architecture and the Built Environment)

Contributor(s)

N. Ibrahimli – Mentor (TU Delft - Urban Data Science)

G.A.K. Arroyo Ohori – Mentor (TU Delft - Urban Data Science)

Faculty
Architecture and the Built Environment
More Info
expand_more
Publication Year
2024
Language
English
Graduation Date
20-06-2024
Awarding Institution
Delft University of Technology
Programme
Geomatics
Faculty
Architecture and the Built Environment
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Generative AI is developing rapidly and has shown great potential in generating and editing images with text prompts. It has achieved partial success in the challenging 3D model edits of common objects. However, there is a lack of attention in the building domain, which already faces certain limits in the 2D space. 3D building models automatic edit has wide application potential, especially for renovation, concept comparison, and large-scale scenes. Therefore, the thesis aims to edit the building models effectively through texture creation and geometric changes with generative AI by evaluating the existing 3D edit pipelines and putting forward practical solutions based on them.

The thesis first examines the current methods. Latent-paint, Text2Tex and X-Mesh are chosen as the representative existing explicit representation (mesh) based pipelines for their relatively satisfying performance in common objects and the different ideas contained. Modifications are made based on them to improve output quality from the aspects of the control module and edit module. For the first aspect, attempts include adding the view specification text prompt to the original X-Mesh and using the image as the control for X-Mesh. Image control X-Mesh is also further experimented with placing higher attention on the input image view and editing the key semantic parts of the building separately. For the latter aspect, modifications include freezing sampled vertices geometry and combining X-Mesh and Text2Tex.

Results show that Latent-Paint mainly creates texture with only major colour and lacks details. Text2Tex generally generates fine textures for inputs with richer depth information. X-Mesh creates textures and edits geometry jointly. It can improve the fidelity of the input mesh to some extent but suffers from noise problems. Image control X-Mesh generates more realistic results and outperforms the text control pipeline. Combining Text2Tex with X-Mesh takes advantage of Text2Tex, which can generate smoother and more realistic textures compared to X-Mesh in some cases. The combination is especially recommended for low-fidelity mesh of small or medium size. The user study also shows that the combination of X-Mesh and Text2Tex generates the most favoured results while image control X-Mesh ranks second. There are still limitations in geometric deformation scope, fidelity, generalizability and computational efficiency in the proposed 3D edit pipelines. The thesis succeeds in proving the application potential of 3D edit pipelines in the building domain and obtaining high-quality results by modifying existing methods.

Files

P5_Thesis.pdf
(pdf | 22.8 Mb)
License info not available
P5_Presentation.pdf
(pdf | 3.94 Mb)
License info not available
P2_GraduationPlan.pdf
(pdf | 1.66 Mb)
License info not available