NS
N. Singh
info
Please Note
<p>This page displays the records of the person named above and is not linked to a unique person identifier. This record may need to be merged to a profile.</p>
2 records found
1
Sidewalks are fundamental urban infrastructure, yet large-scale, openly available geospatial sidewalk datasets remain scarce worldwide, hindering pedestrian routing, accessibility analysis, and urban planning applications. This thesis investigates the feasibility of transforming crowdsourced street view imagery from Mapillary into structured georeferenced sidewalk data using only openly licensed data and pre-trained models. Two complementary automated pipelines are developed and evaluated across three study areas in Amsterdam (Netherlands) and Boston (United States).
The first pipeline, Sidewalk Inventory Mapping, constructs a binary presence inventory (yes/no per roadside) by aggregating pre-computed semantic segmentation outputs and SfM-corrected camera metadata from the Mapillary API onto the OpenStreetMap road network, without any local model inference. The second pipeline, Sidewalk Geometry Reconstruction, downloads imagery and processes it locally using a vision foundation model (DINOv3) for semantic segmentation and monocular metric depth estimation (Depth Anything V3) to reconstruct sidewalk polygons and centerlines.
Results demonstrate that the inventory pipeline achieves algorithmic precision of 97.2% to 98.6% and algorithmic recall up to 86.5%, providing highly reliable sidewalk detection where imagery exists. The geometry reconstruction pipeline successfully produces sidewalk polygons but with limited spatial accuracy (IoU 0.100–0.329, width MAE 1.18–1.40 m), resulting from cascading errors in segmentation, depth estimation, and GPS positioning. Both pipelines are fundamentally bounded by Mapillary's spatial coverage, with system recall dropping substantially in areas lacking imagery. The inventory approach emerges as the practical, scalable solution for city-wide deployment, while the geometry reconstruction demonstrates technical feasibility but requires further refinement for production use. All outputs are compatible with open data standards and can support OpenStreetMap enrichment workflows. ...
The first pipeline, Sidewalk Inventory Mapping, constructs a binary presence inventory (yes/no per roadside) by aggregating pre-computed semantic segmentation outputs and SfM-corrected camera metadata from the Mapillary API onto the OpenStreetMap road network, without any local model inference. The second pipeline, Sidewalk Geometry Reconstruction, downloads imagery and processes it locally using a vision foundation model (DINOv3) for semantic segmentation and monocular metric depth estimation (Depth Anything V3) to reconstruct sidewalk polygons and centerlines.
Results demonstrate that the inventory pipeline achieves algorithmic precision of 97.2% to 98.6% and algorithmic recall up to 86.5%, providing highly reliable sidewalk detection where imagery exists. The geometry reconstruction pipeline successfully produces sidewalk polygons but with limited spatial accuracy (IoU 0.100–0.329, width MAE 1.18–1.40 m), resulting from cascading errors in segmentation, depth estimation, and GPS positioning. Both pipelines are fundamentally bounded by Mapillary's spatial coverage, with system recall dropping substantially in areas lacking imagery. The inventory approach emerges as the practical, scalable solution for city-wide deployment, while the geometry reconstruction demonstrates technical feasibility but requires further refinement for production use. All outputs are compatible with open data standards and can support OpenStreetMap enrichment workflows. ...
Sidewalks are fundamental urban infrastructure, yet large-scale, openly available geospatial sidewalk datasets remain scarce worldwide, hindering pedestrian routing, accessibility analysis, and urban planning applications. This thesis investigates the feasibility of transforming crowdsourced street view imagery from Mapillary into structured georeferenced sidewalk data using only openly licensed data and pre-trained models. Two complementary automated pipelines are developed and evaluated across three study areas in Amsterdam (Netherlands) and Boston (United States).
The first pipeline, Sidewalk Inventory Mapping, constructs a binary presence inventory (yes/no per roadside) by aggregating pre-computed semantic segmentation outputs and SfM-corrected camera metadata from the Mapillary API onto the OpenStreetMap road network, without any local model inference. The second pipeline, Sidewalk Geometry Reconstruction, downloads imagery and processes it locally using a vision foundation model (DINOv3) for semantic segmentation and monocular metric depth estimation (Depth Anything V3) to reconstruct sidewalk polygons and centerlines.
Results demonstrate that the inventory pipeline achieves algorithmic precision of 97.2% to 98.6% and algorithmic recall up to 86.5%, providing highly reliable sidewalk detection where imagery exists. The geometry reconstruction pipeline successfully produces sidewalk polygons but with limited spatial accuracy (IoU 0.100–0.329, width MAE 1.18–1.40 m), resulting from cascading errors in segmentation, depth estimation, and GPS positioning. Both pipelines are fundamentally bounded by Mapillary's spatial coverage, with system recall dropping substantially in areas lacking imagery. The inventory approach emerges as the practical, scalable solution for city-wide deployment, while the geometry reconstruction demonstrates technical feasibility but requires further refinement for production use. All outputs are compatible with open data standards and can support OpenStreetMap enrichment workflows.
The first pipeline, Sidewalk Inventory Mapping, constructs a binary presence inventory (yes/no per roadside) by aggregating pre-computed semantic segmentation outputs and SfM-corrected camera metadata from the Mapillary API onto the OpenStreetMap road network, without any local model inference. The second pipeline, Sidewalk Geometry Reconstruction, downloads imagery and processes it locally using a vision foundation model (DINOv3) for semantic segmentation and monocular metric depth estimation (Depth Anything V3) to reconstruct sidewalk polygons and centerlines.
Results demonstrate that the inventory pipeline achieves algorithmic precision of 97.2% to 98.6% and algorithmic recall up to 86.5%, providing highly reliable sidewalk detection where imagery exists. The geometry reconstruction pipeline successfully produces sidewalk polygons but with limited spatial accuracy (IoU 0.100–0.329, width MAE 1.18–1.40 m), resulting from cascading errors in segmentation, depth estimation, and GPS positioning. Both pipelines are fundamentally bounded by Mapillary's spatial coverage, with system recall dropping substantially in areas lacking imagery. The inventory approach emerges as the practical, scalable solution for city-wide deployment, while the geometry reconstruction demonstrates technical feasibility but requires further refinement for production use. All outputs are compatible with open data standards and can support OpenStreetMap enrichment workflows.
SpatiaLLM
Bridging the Gap Between Natural Language and 3D Scans
Recent advances in large language models (LLMs) have expanded natural language reasoning and multimodal understanding but remain limited in grounding with 3D spatial environments. This project addresses that gap by developing a system that enables natural language interaction with indoor spatial data derived from light detection and ranging (LiDAR) point clouds and panoramic imagery provided by the client: ScanPlan. The system processes spatial data through a pipeline that includes room segmentation, geometric analysis, and object clustering. A structured query language lite (SQLite) database stores the structured information, which an AI agent queries using a reasoning framework that translates natural language into actionable commands. The system supports multimodal input, allowing users to interact via text or by selecting objects in 2D panoramas, which are then mapped to 3D point clouds using segment anything model 2 (SAM2). The interface combines a chat function with 2D and 3D viewers, making spatial data accessible to non-experts. While the prototype successfully answers a range of spatial and semantic queries, challenges remain in scaling room segmentation and handling complex multi-room relationships. The project demonstrates a step towards making rich 3Dbuilding data queryable through intuitive, language-based interaction.
...
Recent advances in large language models (LLMs) have expanded natural language reasoning and multimodal understanding but remain limited in grounding with 3D spatial environments. This project addresses that gap by developing a system that enables natural language interaction with indoor spatial data derived from light detection and ranging (LiDAR) point clouds and panoramic imagery provided by the client: ScanPlan. The system processes spatial data through a pipeline that includes room segmentation, geometric analysis, and object clustering. A structured query language lite (SQLite) database stores the structured information, which an AI agent queries using a reasoning framework that translates natural language into actionable commands. The system supports multimodal input, allowing users to interact via text or by selecting objects in 2D panoramas, which are then mapped to 3D point clouds using segment anything model 2 (SAM2). The interface combines a chat function with 2D and 3D viewers, making spatial data accessible to non-experts. While the prototype successfully answers a range of spatial and semantic queries, challenges remain in scaling room segmentation and handling complex multi-room relationships. The project demonstrates a step towards making rich 3Dbuilding data queryable through intuitive, language-based interaction.