SpatiaLLM
Bridging the Gap Between Natural Language and 3D Scans
M.J. van der Meer (TU Delft - Architecture and the Built Environment)
H. Ye (TU Delft - Architecture and the Built Environment)
S.T. ter Braak (TU Delft - Architecture and the Built Environment)
J. Pille (TU Delft - Architecture and the Built Environment)
N. Singh (TU Delft - Architecture and the Built Environment)
L. Nan – Mentor (TU Delft - Urban Data Science)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Recent advances in large language models (LLMs) have expanded natural language reasoning and multimodal understanding but remain limited in grounding with 3D spatial environments. This project addresses that gap by developing a system that enables natural language interaction with indoor spatial data derived from light detection and ranging (LiDAR) point clouds and panoramic imagery provided by the client: ScanPlan. The system processes spatial data through a pipeline that includes room segmentation, geometric analysis, and object clustering. A structured query language lite (SQLite) database stores the structured information, which an AI agent queries using a reasoning framework that translates natural language into actionable commands. The system supports multimodal input, allowing users to interact via text or by selecting objects in 2D panoramas, which are then mapped to 3D point clouds using segment anything model 2 (SAM2). The interface combines a chat function with 2D and 3D viewers, making spatial data accessible to non-experts. While the prototype successfully answers a range of spatial and semantic queries, challenges remain in scaling room segmentation and handling complex multi-room relationships. The project demonstrates a step towards making rich 3Dbuilding data queryable through intuitive, language-based interaction.