Recent advances in large language models (LLMs) have expanded natural language reasoning and multimodal understanding but remain limited in grounding with 3D spatial environments. This project addresses that gap by developing a system that enables natural language interaction wit
...
Recent advances in large language models (LLMs) have expanded natural language reasoning and multimodal understanding but remain limited in grounding with 3D spatial environments. This project addresses that gap by developing a system that enables natural language interaction with indoor spatial data derived from light detection and ranging (LiDAR) point clouds and panoramic imagery provided by the client: ScanPlan. The system processes spatial data through a pipeline that includes room segmentation, geometric analysis, and object clustering. A structured query language lite (SQLite) database stores the structured information, which an AI agent queries using a reasoning framework that translates natural language into actionable commands. The system supports multimodal input, allowing users to interact via text or by selecting objects in 2D panoramas, which are then mapped to 3D point clouds using segment anything model 2 (SAM2). The interface combines a chat function with 2D and 3D viewers, making spatial data accessible to non-experts. While the prototype successfully answers a range of spatial and semantic queries, challenges remain in scaling room segmentation and handling complex multi-room relationships. The project demonstrates a step towards making rich 3Dbuilding data queryable through intuitive, language-based interaction.