L. Nan | TU Delft Repository

Seabed Fingerprinting for Maritime Navigation in GNSS-Denied Environments

SAND-E: Seabed-Aided Navigation Using Classical and Learned Image Matching

Master thesis (2026) - J. Pille, Robert Voûte, L. Nan, R.C. Lindenbergh

Maritime navigation relies heavily on Global Navigation Satellite Systems (GNSS), yet military surface vessels must remain operational when satellite signals are unavailable, degraded, or denied. In such GNSS-denied environments, Inertial Navigation Systems (INS) accumulate unbounded drift, while existing Terrain-Aided Navigation (TAN) methods remain sensitive to terrain distinctiveness and are rarely evaluated for surface vessels. We present SAND-E, a particle-filter framework for seabed-aided maritime navigation that treats seabed fingerprinting as an image matching problem. Near real-time Multibeam Echosounder (MBES) measurements are matched against bathymetric reference maps using Normalized Cross-Correlation (NCC), SuperPoint+LightGlue (SP+LG), or a combined prior-gated method, and the resulting position fixes are fused into the particle filter for recursive state estimation. Evaluated on North Sea and Atlantic Ocean bathymetry, NCC outperforms SP+LG across all metrics, achieving an RMSE of 92.1 m, a 100% fix rate, 92.6% of runs within 500 m, and a runtime of 0.5 ms per fix. The combined method matches NCC under nominal conditions but provides additional robustness with an outdated reference map, where the prior gate rejects degraded NCC fixes and falls back to SP+LG. The framework generalizes across three geographically distinct test areas, remains viable with a three-year-old reference map, and reduces average final position error from Dead Reckoning (DR) to 115.3 m, demonstrating seabed fingerprinting as a viable infrastructure-independent navigation solution for GNSS-denied military surface vessels. ...

Recovering Visual Saliency from Intrinsic Properties of 3D Gaussian Splatting

Master thesis (2026) - X. Bi, B.M. Meijers, L. Nan

3D Gaussian Splatting (3DGS) represents scenes as collections of Gaussian primitives whose attributes are shaped by multi-view photographic supervision. This raises a natural question: does the photographer’s visual focus leave a measurable imprint on these intrinsic properties? While prior work has explored segmentation and scene decomposition in 3DGS, no existing method has investigated whether Gaussian attributes alone encode visual saliency. We propose a mask-free, post hoc classifier that recovers the photographer’s region of interest from Gaussian attributes, requiring neither the original training images nor any 2D foundation model. Trained on scenes from Tanks and Temples and MipNeRF360, our method achieves a mean LOOCV F1 of 0.957 and generalizes to unseen scenes with a mean test F1 of 0.929. Projected 3D saliency masks show strong alignment with U2-Net predictions on original training images, confirming that multi-view Gaussian intrinsic properties capture a geometrically consistent, view-stable notion of saliency that single-frame 2D methods cannot provide. These properties make our method applicable to automatic foreground extraction, capture intent analysis, and perceptual quality-driven compression for bandwidth-efficient streaming. ...

Adaptive Plane Splatting for 3D Building Reconstruction

Master thesis (2026) - M. Hu, L. Nan, M. Weinmann

In the last three years, 3DGS has attracted widespread attention due to its fast training and high-quality rendering, leading to numerous surface reconstruction studies proposing geometric constraints to extract high-quality meshes. Although these methods demonstrate potential for building model reconstruction, modern 3D building applications increasingly require watertight, Boundary Representation (B-rep) models for analytical tasks rather than the standard triangular meshes typically generated. On the other hand, traditional piecewise-planar reconstruction methods that rely on point clouds are often computationally heavy and unstable when generating plane hypotheses from noisy data. To address these limitations, this thesis proposes AdaptivePS, an adaptive, image-to-plane splatting pipeline for multi-view indoor and outdoor scene surface reconstruction. Designed to function as the foundational step in a broader "image to watertight building model" pipeline, it outputs planar primitives ready to be plugged into a piecewise-planar reconstructor. AdaptivePS extends the baseline PlanarSplatting method to outdoor environments by introducing a foreground mask generator and a novel prior generator that jointly recovers camera poses, depth, and normal maps in a single inference—bypassing SfM entirely while normalizing scenes to a consistent scale. Additionally, the pipeline employs a mask-guided densification and pruning strategy to adaptively split primitives at object boundaries and remove background noise, alongside a mask-guided trimming mechanism applied to sampled points for sharper boundary delineation. Experiments demonstrate that AdaptivePS achieves sufficient geometric quality for outdoor scenes while running 2x as fast as the baseline framework.

The code is available at https://github.com/MCHU-1999/AdaptivePS. ...

Modelling Daylight for Existing Indoor Spaces

Towards formalisation and automation of input data for robust simulations

Doctoral thesis (2026) - Nima Forouzandeh, J.E. Stoter, E. Brembilla, L. Nan

Despite the maturity of physically based daylight simulation tools, their broad applicability to existing buildings remains constrained. This is partly due to the lack of formal definitions that ensure comparability among models created in different contexts, partly due to inefficient techniques for input acquisition, and partly due to gaps in model calibration.
This work addresses these limitations by first defining different levels of geometric agreement between digital and real indoor spaces, termed Geometrical Levels of Detail (GLoD). These levels represent degrees of geometric completeness and resolution. The study quantifies how those degrees of representation translate into errors in daylight simulation results.
A similar framework is introduced for material inputs through Material Classes of Precision (MCoP). These classes represent different techniques for acquiring optical properties. The propagated uncertainty associated with each level of precision is systematically analysed to determine its influence on daylight simulation results.
Third, a semi-automatic pipeline is developed to reconstruct simulation-ready geometry from LiDAR point clouds. The workflow includes preprocessing, watertight reconstruction of permanent objects, and detection and reconstruction of window boundaries with minimal user interaction. Its performance is evaluated using daylight availability and glare metrics.
Fourth, image-based material characterisation techniques are assessed as accessible alternatives to laboratory measurements. Three techniques are validated, and their influence on daylight simulation results is quantified. A spectral uplifting method is further evaluated to reconstruct full spectral reflectance from RGB inputs for spectral daylight simulations.
Finally, a calibration workflow for indoor spectral daylight simulation is introduced to account for uncertainties related to exterior conditions and window characterisation. Measured spectral irradiance data are used to minimise simulation error. Together, these contributions enable practitioners and researchers to create a robust digital daylight model for existing indoor spaces. ...

Despite the maturity of physically based daylight simulation tools, their broad applicability to existing buildings remains constrained. This is partly due to the lack of formal definitions that ensure comparability among models created in different contexts, partly due to inefficient techniques for input acquisition, and partly due to gaps in model calibration.
This work addresses these limitations by first defining different levels of geometric agreement between digital and real indoor spaces, termed Geometrical Levels of Detail (GLoD). These levels represent degrees of geometric completeness and resolution. The study quantifies how those degrees of representation translate into errors in daylight simulation results.
A similar framework is introduced for material inputs through Material Classes of Precision (MCoP). These classes represent different techniques for acquiring optical properties. The propagated uncertainty associated with each level of precision is systematically analysed to determine its influence on daylight simulation results.
Third, a semi-automatic pipeline is developed to reconstruct simulation-ready geometry from LiDAR point clouds. The workflow includes preprocessing, watertight reconstruction of permanent objects, and detection and reconstruction of window boundaries with minimal user interaction. Its performance is evaluated using daylight availability and glare metrics.
Fourth, image-based material characterisation techniques are assessed as accessible alternatives to laboratory measurements. Three techniques are validated, and their influence on daylight simulation results is quantified. A spectral uplifting method is further evaluated to reconstruct full spectral reflectance from RGB inputs for spectral daylight simulations.
Finally, a calibration workflow for indoor spectral daylight simulation is introduced to account for uncertainties related to exterior conditions and window characterisation. Measured spectral irradiance data are used to minimise simulation error. Together, these contributions enable practitioners and researchers to create a robust digital daylight model for existing indoor spaces.

Exploiting the Test-time Reference Map for Visual Place Recognition

Doctoral thesis (2026) - M. Zaffar, J.F.P. Kooij, L. Nan

Visual Place Recognition (VPR) is a key task in computer vision and robotics, enabling loop closure in SLAM, image-based localization, landmark retrieval, and navigation. While deep-learning approaches have improved robustness to viewpoint, illumination, seasonal, and dynamic changes, three less explored challenges—domain generalization, uncertainty estimation, and localization accuracy—remain critical. This thesis demonstrates that test-time reference maps, typically used only for retrieval, can be exploited to address these challenges without additional sensors or retraining.

A unified evaluation framework, VPR-Bench, is introduced to standardize datasets, metrics, and evaluation practices across robotics and vision communities. VPR-Bench enables meta-analyses of descriptor size, runtime trade-offs, viewpoint and illumination invariance, and retrieval efficiency, highlighting that no single VPR method is universally best.

To improve cross-domain robustness, Reference-Set Finetuning (RSF) is proposed: a self-supervised finetuning strategy using test-time reference images to reduce train-test domain gaps. For reliability, Spatial Uncertainty Estimation (SUE) leverages reference map metadata to quantify the spatial spread of top-ranked poses, outperforming lightweight methods and complementing geometric verification. Finally, Continuous Place-descriptor Regression (CoPR) densifies the feature space by regressing descriptors at novel poses, reducing localization errors caused by map quantization and enhancing accuracy when combined with viewpoint-variant encoders.

Overall, this thesis reframes the reference map from a passive database to an active, exploitable resource. By systematically leveraging map information through RSF, SUE, and CoPR, it delivers measurable improvements in robustness, reliability, and localization accuracy, advancing map-aware VPR for real-world robotics and autonomous systems. ...

3D Urban Understanding from Point Clouds

Doctoral thesis (2026) - S. Du, J.E. Stoter, J.F.P. Kooij, L. Nan

Automated analysis and interpretation of 3D urban environments from laser-scanned point clouds has emerged as a critical research area with broad applications in urban planning, land administration, autonomous driving, and navigation. Despite remarkable progress in this field, researchers face two key challenges: (i) the comparatively slower advancement of methodologies for 3D point cloud analysis compared to 2D image-based techniques, and (ii) the difficulty of scaling these methods to large and complex real-world urban environments. This thesis addresses both aspects by exploring methodological innovations in 3D point cloud processing and investigating their applicability to large-scale urban settings, with an overall aim of supporting more robust and reliable interpretation of 3D urban scenes.... ...

Roof Structure Extraction from Remote Sensing Images

Master thesis (2025) - H.Y. Cheng, L. Nan, W. Gao, A. Rafiee

This thesis presents a method for extracting structured roof surfaces from remote sensing images. It achieved this by combining semantic segmentation with polygon-based refinement, which allows rooftop boundaries to be described more accurately using line and shape information. The method includes three main stages: (1) using an instance segmentation model to detect and classify rooftop areas; (2) generating polygonal candidates for plannar roof regions based on detected line features; and (3) optimizing label assignments through a Markov Random Field (MRF) model, which integrates prediction confidence with the spatial relationships between polygons. Experiments on benchmark datasets show that this approach improves the accuracy and consistency of rooftop segmentation while reducing incorrect detections. The system is modular and flexible, making it suitable for applications that require reliable roof structure analysis in urban environments. ...

SpatiaLLM

Bridging the Gap Between Natural Language and 3D Scans

Student report (2025) - M.J. van der Meer, H. Ye, S.T. ter Braak, J. Pille, N. Singh, L. Nan

Recent advances in large language models (LLMs) have expanded natural language reasoning and multimodal understanding but remain limited in grounding with 3D spatial environments. This project addresses that gap by developing a system that enables natural language interaction with indoor spatial data derived from light detection and ranging (LiDAR) point clouds and panoramic imagery provided by the client: ScanPlan. The system processes spatial data through a pipeline that includes room segmentation, geometric analysis, and object clustering. A structured query language lite (SQLite) database stores the structured information, which an AI agent queries using a reasoning framework that translates natural language into actionable commands. The system supports multimodal input, allowing users to interact via text or by selecting objects in 2D panoramas, which are then mapped to 3D point clouds using segment anything model 2 (SAM2). The interface combines a chat function with 2D and 3D viewers, making spatial data accessible to non-experts. While the prototype successfully answers a range of spatial and semantic queries, challenges remain in scaling room segmentation and handling complex multi-room relationships. The project demonstrates a step towards making rich 3Dbuilding data queryable through intuitive, language-based interaction. ...

Structure-aware 3D Building Reconstruction

Doctoral thesis (2025) - J. Huang, J.E. Stoter, L. Nan

Lightweight and accurate building models have been widely used in diverse applications such as urban planning, virtual reality, and navigation. In recent years, structure-aware building reconstruction has emerged as a crucial research area. Despite significant advancements in measurement techniques such as Light Detection and Ranging (LiDAR) and photogrammetry, the raw data often contains different types of imperfections, such as noise, outliers, and missing regions. These imperfections pose challenges for the accurate and efficient reconstruction of complex building structures. Therefore, this thesis aims to address these challenges by proposing methods for automatic Level of Detail 2 (LoD2) building reconstruction from airborne LiDAR point clouds, and semi-automatic Level of Detail 3 (LoD3) building reconstruction from Multi-View Stereo (MVS) meshes. Throughout the reconstruction process, structural details are easily distorted in the final output due to the inaccuracies and imperfections of input data. Given that regularities such as symmetry are prevalent in building models, they can be leveraged to recover lost or distorted building structures. To facilitate the recovery of symmetry, building elements are projected into facade planes to be two-dimensional (2D) polygonal shapes. Therefore, to obtain accurate and aesthetically pleasing models, this thesis also focuses on recovering the symmetry of these 2D polygonal shapes generated from buildings.

My first contribution is city-scale LoD2 building reconstruction from airborne LiDAR point clouds. While LiDAR data provides rich geometric information, reconstructing detailed building models at such a large scale remains an open problem. This thesis proposes a novel method to tackle this problem, achieving accurate city-scale LoD2 building reconstruction. Firstly, I use footprint data to segment out the point clouds of individual building instances. Then, I detect planar primitives using a region-growing algorithm and infer wall primitives by applying a vertical assumption on the missing regions. Then an abundant set of candidate faces is generated by intersecting the planes derived from roof and wall primitives. Finally, I can obtain a compact building model by selecting the optimal subset of candidate faces through solving an integer programming problem. Geometry constraints are enforced to ensure that the final model is manifold and watertight.

My second contribution is a semi-automatic method for reconstructing LoD3 building models from MVS meshes. While MVS techniques can generate dense and detailed triangular surfaces, creating compact and accurate LoD3 models from them remains challenging due to the limited data resolution. The proposed method is designed to strike a balance between human interactions and automation, aiming to maximize efficiency while minimizing user efforts. The process begins with a coarse segmentation using variational shape approximation. Then, simple and intuitive operations are introduced to refine the segmentation results by solving a multi-label optimization problem. At this stage, the user’s involvement is minimal and limited to providing high-level guidance, ensuring that the system remains user-friendly. Importantly, these interactions are kept to a minimum, allowing users to make adjustments without requiring precise input, making the process more efficient than manual reconstruction. Finally, the face normals and vertices of the mesh are updated based on the refined segmentation, and the layout of the model is regularized to produce an accurate LoD3 building model. This semi-automatic approach combines the strengths of both user input and automated computation, offering a practical solution for detailed building reconstruction that is both effective and user-friendly.

My third contribution is a novel algorithm to automatically symmetrize 2D polygonal shapes, which is essential to regularize the shapes and enhance the visual aesthetics of building models. The method follows a hypothesis-and-selection pipeline. Taking a 2D polygonal shape generated from a building model as input, I first generate a set of potential symmetric edge pairs. Then the initial set is pruned by two simple geometric tests. Finally, a perfectly symmetric shape is obtained by solving a mixed integer quadratic programming problem. Two hard constraints are imposed to ensure that the final shape to be symmetric. The method is also designed to handle partial symmetry in cases where perfect symmetry is not achievable.

In summary, I first automatically reconstruct LoD2 building models from airborne LiDAR point clouds. Then, I reconstruct LoD3 building models from MVS meshes by incorporating user guidance, which depicts a more detailed representation of building models. To obtain more accurate and visually pleasing building models, I propose to symmetrize the 2D polygonal shapes generated from facade elements of reconstructed models. ...

Lightweight and accurate building models have been widely used in diverse applications such as urban planning, virtual reality, and navigation. In recent years, structure-aware building reconstruction has emerged as a crucial research area. Despite significant advancements in measurement techniques such as Light Detection and Ranging (LiDAR) and photogrammetry, the raw data often contains different types of imperfections, such as noise, outliers, and missing regions. These imperfections pose challenges for the accurate and efficient reconstruction of complex building structures. Therefore, this thesis aims to address these challenges by proposing methods for automatic Level of Detail 2 (LoD2) building reconstruction from airborne LiDAR point clouds, and semi-automatic Level of Detail 3 (LoD3) building reconstruction from Multi-View Stereo (MVS) meshes. Throughout the reconstruction process, structural details are easily distorted in the final output due to the inaccuracies and imperfections of input data. Given that regularities such as symmetry are prevalent in building models, they can be leveraged to recover lost or distorted building structures. To facilitate the recovery of symmetry, building elements are projected into facade planes to be two-dimensional (2D) polygonal shapes. Therefore, to obtain accurate and aesthetically pleasing models, this thesis also focuses on recovering the symmetry of these 2D polygonal shapes generated from buildings.

My first contribution is city-scale LoD2 building reconstruction from airborne LiDAR point clouds. While LiDAR data provides rich geometric information, reconstructing detailed building models at such a large scale remains an open problem. This thesis proposes a novel method to tackle this problem, achieving accurate city-scale LoD2 building reconstruction. Firstly, I use footprint data to segment out the point clouds of individual building instances. Then, I detect planar primitives using a region-growing algorithm and infer wall primitives by applying a vertical assumption on the missing regions. Then an abundant set of candidate faces is generated by intersecting the planes derived from roof and wall primitives. Finally, I can obtain a compact building model by selecting the optimal subset of candidate faces through solving an integer programming problem. Geometry constraints are enforced to ensure that the final model is manifold and watertight.

My second contribution is a semi-automatic method for reconstructing LoD3 building models from MVS meshes. While MVS techniques can generate dense and detailed triangular surfaces, creating compact and accurate LoD3 models from them remains challenging due to the limited data resolution. The proposed method is designed to strike a balance between human interactions and automation, aiming to maximize efficiency while minimizing user efforts. The process begins with a coarse segmentation using variational shape approximation. Then, simple and intuitive operations are introduced to refine the segmentation results by solving a multi-label optimization problem. At this stage, the user’s involvement is minimal and limited to providing high-level guidance, ensuring that the system remains user-friendly. Importantly, these interactions are kept to a minimum, allowing users to make adjustments without requiring precise input, making the process more efficient than manual reconstruction. Finally, the face normals and vertices of the mesh are updated based on the refined segmentation, and the layout of the model is regularized to produce an accurate LoD3 building model. This semi-automatic approach combines the strengths of both user input and automated computation, offering a practical solution for detailed building reconstruction that is both effective and user-friendly.

My third contribution is a novel algorithm to automatically symmetrize 2D polygonal shapes, which is essential to regularize the shapes and enhance the visual aesthetics of building models. The method follows a hypothesis-and-selection pipeline. Taking a 2D polygonal shape generated from a building model as input, I first generate a set of potential symmetric edge pairs. Then the initial set is pruned by two simple geometric tests. Finally, a perfectly symmetric shape is obtained by solving a mixed integer quadratic programming problem. Two hard constraints are imposed to ensure that the final shape to be symmetric. The method is also designed to handle partial symmetry in cases where perfect symmetry is not achievable.

In summary, I first automatically reconstruct LoD2 building models from airborne LiDAR point clouds. Then, I reconstruct LoD3 building models from MVS meshes by incorporating user guidance, which depicts a more detailed representation of building models. To obtain more accurate and visually pleasing building models, I propose to symmetrize the 2D polygonal shapes generated from facade elements of reconstructed models.

Plant Skeleton Extraction and Stem-leaf Segmentation

Master thesis (2024) - Q. Shen, L. Nan, J.E. Stoter

Plant phenotyping plays a vital role in plant genetics and breeding programs, providing the foundation for screening and evaluating genetic diversity and linking phenotypic parameters to the genetic determinants of trait expression. This process is critical for identifying molecular markers and accelerating genetic breeding improvement, thereby enhancing plant resilience to biotic and abiotic stresses such as drought, salinity, and diseases. Recent advancements in 3D sensing technology have empowered researchers to extract precise phenotypic parameters from plant point clouds, enabling more detailed and accurate plant phenotyping. A critical step in point cloud-based plant phenotyping is plant organ segmentation. Among the available segmentation methods, skeleton-based approaches are simple and intuitive, and could leveraging both local and global geometric information from plant point clouds to facilitate accurate organ segmentation. These methods have gained considerable attention in recent years. However, plant skeletonization, the core component of these approaches, remains limited in handling leafy plants, especially herbaceous species with complex shoot architectures, lateral stems, and multiple leaves. These structural complexities pose challenges that current skeletonization techniques struggle to address effectively.

To address this challenge, we propose a skeleton-based plant organ segmentation framework that accurately extracts curve skeletons from individual plant point clouds and performs precise stem-leaf segmentation based on these skeletons. Our framework is particularly effective in handling leafy plants. It preserves fine structural details during skeletonization while avoiding abnormal or noisy local branches by extending the Laplacian-based contraction (LBC) algorithm through the integration of the Constrained Laplacian Operator. Moreover, we introduce Adaptive Constraints and Tip Points Preservation within the contraction loops to further refine skeleton quality. Additionally, a modified Locally Optimal Projection (LOP) operator is utilized to perform skeleton points calibration, ensuring that the extracted skeleton is centrally aligned with the original plant shape. Furthermore, to evaluate the performance of our proposed framework, we contribute a photogrammetric 3D plant point cloud dataset of 56 Polygonum lapathifolium plants, complete with detailed annotations. Experiment results demonstrate that our framework robustly handles various shapes and sizes of leafy plants and tree branches.

In conclusion, our study enhances the LBC algorithm by integrating the Constrained Laplacian Operator, Adaptive Constraints, and Tip Points Preservation. These improvements increase the accuracy and quality of curve skeleton extraction from leafy plant point clouds, enabling satisfactory plant organ segmentation. ...

Plant phenotyping plays a vital role in plant genetics and breeding programs, providing the foundation for screening and evaluating genetic diversity and linking phenotypic parameters to the genetic determinants of trait expression. This process is critical for identifying molecular markers and accelerating genetic breeding improvement, thereby enhancing plant resilience to biotic and abiotic stresses such as drought, salinity, and diseases. Recent advancements in 3D sensing technology have empowered researchers to extract precise phenotypic parameters from plant point clouds, enabling more detailed and accurate plant phenotyping. A critical step in point cloud-based plant phenotyping is plant organ segmentation. Among the available segmentation methods, skeleton-based approaches are simple and intuitive, and could leveraging both local and global geometric information from plant point clouds to facilitate accurate organ segmentation. These methods have gained considerable attention in recent years. However, plant skeletonization, the core component of these approaches, remains limited in handling leafy plants, especially herbaceous species with complex shoot architectures, lateral stems, and multiple leaves. These structural complexities pose challenges that current skeletonization techniques struggle to address effectively.

To address this challenge, we propose a skeleton-based plant organ segmentation framework that accurately extracts curve skeletons from individual plant point clouds and performs precise stem-leaf segmentation based on these skeletons. Our framework is particularly effective in handling leafy plants. It preserves fine structural details during skeletonization while avoiding abnormal or noisy local branches by extending the Laplacian-based contraction (LBC) algorithm through the integration of the Constrained Laplacian Operator. Moreover, we introduce Adaptive Constraints and Tip Points Preservation within the contraction loops to further refine skeleton quality. Additionally, a modified Locally Optimal Projection (LOP) operator is utilized to perform skeleton points calibration, ensuring that the extracted skeleton is centrally aligned with the original plant shape. Furthermore, to evaluate the performance of our proposed framework, we contribute a photogrammetric 3D plant point cloud dataset of 56 Polygonum lapathifolium plants, complete with detailed annotations. Experiment results demonstrate that our framework robustly handles various shapes and sizes of leafy plants and tree branches.

In conclusion, our study enhances the LBC algorithm by integrating the Constrained Laplacian Operator, Adaptive Constraints, and Tip Points Preservation. These improvements increase the accuracy and quality of curve skeleton extraction from leafy plant point clouds, enabling satisfactory plant organ segmentation.

Semantic understanding of urban scenes from textured meshes

Doctoral thesis (2024) - W. Gao, H. Ledoux, L. Nan

The thesis explores the semantic understanding of urban textured meshes derived from photogrammetric methods. It primarily addresses three aspects with regard to urban textured meshes: 1) semantic annotation and the creation of benchmark datasets, 2) semantic segmentation, and 3) the automation of lightweight 3D city modeling using semantic information.

The first focus of the thesis is the development of a benchmark dataset to evaluate the performance of advanced 3D semantic segmentation methods in urban settings. An interactive 3D annotation framework has been proposed to assign ground truth labels to the urban meshes' triangle faces and texture pixels. This framework achieves efficient and accurate semi-automatic annotation through segment classification and structure-aware interactive selection. In the center of Helsinki, Finland, object-level annotations were made over approximately 4 km\(^2\) (including buildings, vegetation, and vehicles, etc.), and part-level annotations over about 2.5 km\(^2\) (including building parts like doors, windows, and road markings, etc.). The design of the annotation tools improves user operation and enables quick annotation of large scenes, while the resulting datasets allow researchers to refine their deep learning models for urban analysis.

Another research focus is on mesh segmentation algorithms. A novel semantic mesh segmentation algorithm has been introduced for large-scale urban environments, employing plane-sensitive over-segmentation combined with graph-based methods for contextual data integration. This approach, which utilizes graph convolutional networks for classification, significantly improves performance over traditional techniques based on our proposed benchmark datasets.

Finally, leveraging this semantic information, a pipeline for reconstructing lightweight 3D city models has been designed. This facilitates the automated reconstruction of CityGML-based LoD2 and LoD3 city models, ensuring high fidelity in geometric detail and semantic accuracy. The reconstructed large-scale, lightweight, and semantic city models significantly broaden applications in urban spatial intelligence, including automatic geometric measurements, interactive spatial computations, spatial analysis based on external data, and environment simulation using physical engines.

This thesis enhances the practicality of 3D data in real-world applications by utilizing semantic parsing of urban textured meshes to generate lightweight 3D urban semantic models, greatly enriching their usability. It also lays a solid foundation for future progress in understanding, modeling, and analyzing 3D urban scenes. ...

The thesis explores the semantic understanding of urban textured meshes derived from photogrammetric methods. It primarily addresses three aspects with regard to urban textured meshes: 1) semantic annotation and the creation of benchmark datasets, 2) semantic segmentation, and 3) the automation of lightweight 3D city modeling using semantic information.

The first focus of the thesis is the development of a benchmark dataset to evaluate the performance of advanced 3D semantic segmentation methods in urban settings. An interactive 3D annotation framework has been proposed to assign ground truth labels to the urban meshes' triangle faces and texture pixels. This framework achieves efficient and accurate semi-automatic annotation through segment classification and structure-aware interactive selection. In the center of Helsinki, Finland, object-level annotations were made over approximately 4 km\(^2\) (including buildings, vegetation, and vehicles, etc.), and part-level annotations over about 2.5 km\(^2\) (including building parts like doors, windows, and road markings, etc.). The design of the annotation tools improves user operation and enables quick annotation of large scenes, while the resulting datasets allow researchers to refine their deep learning models for urban analysis.

Another research focus is on mesh segmentation algorithms. A novel semantic mesh segmentation algorithm has been introduced for large-scale urban environments, employing plane-sensitive over-segmentation combined with graph-based methods for contextual data integration. This approach, which utilizes graph convolutional networks for classification, significantly improves performance over traditional techniques based on our proposed benchmark datasets.

Finally, leveraging this semantic information, a pipeline for reconstructing lightweight 3D city models has been designed. This facilitates the automated reconstruction of CityGML-based LoD2 and LoD3 city models, ensuring high fidelity in geometric detail and semantic accuracy. The reconstructed large-scale, lightweight, and semantic city models significantly broaden applications in urban spatial intelligence, including automatic geometric measurements, interactive spatial computations, spatial analysis based on external data, and environment simulation using physical engines.

This thesis enhances the practicality of 3D data in real-world applications by utilizing semantic parsing of urban textured meshes to generate lightweight 3D urban semantic models, greatly enriching their usability. It also lays a solid foundation for future progress in understanding, modeling, and analyzing 3D urban scenes.

Neural Surface Reconstruction and Stylization

Master thesis (2023) - F.S. Visser, N. Ibrahimli, L. Nan

Style transfer is a recent field in the development of deep neural networks, which allows for the style from one image to be transferred onto another image. This has been well-researched for 2D images, but transferring style onto 3D reconstructed content can still be further developed. Being able to style a 3D reconstruction would allow users to recreate anything in the real world, such as a chair, with any style they see fit. Where other methods use texture-based approaches which often create low quality geometry and appearance, or use radiance fields which style a whole scene instead of just the 3D reconstructed object, we have developed a method which styles an implicit surface.
We achieve this by using Implicit Differentiable Renderer (IDR), which trains, using masked images as input, two neural networks that learn the geometry and appearance. Rendered views of the object are styled using 2D neural style transfer (NST) methods, and the style information is used to further train the appearance network to display the given style. With Masked deferred back-propagation we are able to optimize the appearance renderer, which is normally trained on only patches of the rendered image to save memory, while using style transfers designed for full-resolution images. We showcase different results from our method using different 3D reconstruction datasets and style images, and showcase how to implement a user-created dataset. We carry out extensive tests on what effects different parameters have on the final result. Comparing our results to similar 3D style methods demonstrates that our method performs equally well in achieving faithful style transfer, while having the benefits of creating high quality geometry and only styling the reconstructed surface. ...

Shape-guided artistic route finding

Master thesis (2023) - L.P. Powałka, Liangliang Nan, Jantien Stoter

Creating GPS art on the map is an interesting way to make one’s outdoor activity more engaging. Cyclists, runners and hikers can create impressive drawings on the map by traversing the road/pedestrian network in a carefully planned way. Such planning, however, is often tedious and time consuming, which makes the GPS artists have to meticulously design the routes with the complex road network in mind. The aim of our research is to come up with a full process that can express a person’s initial idea (for example a contour drawing) as a route, which the user can then follow to create their GPS art. This involves transforming the input image to match the routing network in a selected area and generating a route which approximates the shape in the best possible way. In our work, searching for patterns in the road network is cast to an image matching problem with template matching as the solution. Generating routes is achieved using a graph routing algorithm with a custom cost function, to make the resulting route as similar to the input shape as possible. Finally, two ways of generating artistic routes are presented. First is an automatic GPS art workflow, which attempts to find an optimal initial location of the route, then generates a number of candidate routes and selects the best one according to various evaluation criteria. The second method is an interactive browser application, where the user can select an initial location for his shape on the map, move, scale or rotate it and get instant feedback in the form of artistic routes displayed in real time. ...

A data-driven approach to add openings to 3D BAG building models

Master thesis (2023) - Y. Xia, J.E. Stoter, W. Gao, L. Nan, G.A.K. Arroyo Ohori

The reconstruction of 3D city models has garnered significant interest in recent years. However, the majority of existing reconstruction methods primarily focus on LOD2 models, while LOD3 model reconstruction often relies on manual labor, and the primary data sources are street view images. This research aims to advance this field by reconstructing LOD3 models through the addition of windows and doors to existing LOD2 models, thereby maximizing the utility of available 3D building models, as well as the accurate addition of windows and doors. This research innovatively utilizes aerial oblique images as the data source for extracting building openings and employs 3D BAG LOD2.2 models as the basic 3D building structures. The 3D facades are projected onto the 2D aerial image space using perspective projection and registration is employed on the projection facade and oblique aerial images. Subsequently, Mask R-CNN is employed to detect and extract the building openings from these projections. Following the extraction, the layout of the openings within the same facade is optimized in terms of both size and position. Lastly, the relative positions of the openings on the facade images are combined with the 3D coordinates of the corresponding facade to calculate the positions of the openings in 3D space. This information is then integrated into the LOD3 model, resulting in a more detailed and accurate representation of the buildings.
This approach successfully reconstructs the final LOD3 model in CityJSON format, which passes the val3dity validation. By effectively utilizing existing 3D building models, this approach conserves a considerable amount of computational resources required for reconstruction. The simplicity and high level of automation of this approach make it a promising solution for reconstructing large-scale LOD3 buildings, leading to more accurate and detailed large 3D urban models. ...

Procedural Modelling of Tree Growth Using Multi-temporal Point Clouds

Master thesis (2022) - N. van der Horst, L. Nan, J.E. Stoter

A digital reconstruction of real-life trees could provide many benefits in fields such as botany, forestry management, biology, and urban planning. Plant growth modelling in particular would enable the analysis of plant structure and behaviour in a customizable, widely applicable and non-destructive manner. Although many data-driven plant reconstruction methods exist, it remains a complex problem due to the intricate branching systems of trees and the need for balancing model soundness with adherence to the often incomplete input data. Modelling plant growth currently proves difficult as well due to the large number of factors involved in the growth process and the level of prior botanical knowledge and/or detailed field data that is often required.

This work uses an automatic MST (Minimum Spanning Tree)-based reconstruction method to obtain tree skeleton models from LiDAR input data. Multiple scans of the same tree, gathered at different years, are related to each other to improve and expand upon the reconstruction. A procedural model is used to simulate the growth in the tree tips using a lobe-based approach and a region-growing algorithm. The growth-based models provide a temporally informed reconstruction that is visually, geometrically and topologically sound. Establishing correspondences in the main structure between timestamps can assist the reconstruction of the tree at a time for which the input data was noisier or incomplete, as well as provide an estimate of the tree's structure in between known data points. This type of reconstruction can be used to both model and study the growth behaviour of trees, for multi-temporal visualisations, and to provide more informed tree models for reconstruction purposes. ...

Detailed Facade Reconstruction for Mahattan-world Buildings

Master thesis (2022) - L. Wang, L. Nan, N. Ibrahimli

3D building models play an important role in many real-world applications. Different models are suitable for different application scenarios based on their levels of detail. LOD3 models with facade details are crucial for many applications, such as virtual reality and urban simulation. Currently, 3D building models with lower LOD are largely available, but the number of LOD3 models is very limited. Most LOD3 reconstruction methods depend on manual operation, which is very time-consuming. How to automatically reconstruct the detailed facade for building models has remained a problem in computer vision. The problem can be seen as an image processing problem, but how to convert the 2D results into 3D smoothly should also be considered.

In this project, we proposed a method to automatically reconstruct the detailed building models based on the Faster R-CNN. The method starts from a set of street view images, and the results are models with facade elements. A 3D point cloud can be extracted from the images using SfM and MVS, and the camera parameters can also be recovered. We take advantage of the high-quality facade images and parse the facades to detect their bounding boxes. The bounding boxes can match pretty well with the rectangular shape of the facade elements. The 2D facade elements can be added to the 3D building model based on the camera parameters. The process is very efficient and automatic. The regularity of the facade elements will be reserved, making the result more convincing. Our method includes four main steps: (1) coarse model reconstruction, (2) facade image selection and rectification, (3) facade element detection and regularization, and (4) detailed facade reconstruction.

Experiment results show that our method can produce reliable building models with facade details for many different situations. It can work for both the multi-face building blocks and the street side buildings. Our test shows that the window detection performance is pretty good. The object detection is extremely fast, and the whole pipeline is lightweight and efficient. In theory, the method can also be extended to reconstruct large-scale city models, which means it has broad application prospects.
...

3D building models play an important role in many real-world applications. Different models are suitable for different application scenarios based on their levels of detail. LOD3 models with facade details are crucial for many applications, such as virtual reality and urban simulation. Currently, 3D building models with lower LOD are largely available, but the number of LOD3 models is very limited. Most LOD3 reconstruction methods depend on manual operation, which is very time-consuming. How to automatically reconstruct the detailed facade for building models has remained a problem in computer vision. The problem can be seen as an image processing problem, but how to convert the 2D results into 3D smoothly should also be considered.

In this project, we proposed a method to automatically reconstruct the detailed building models based on the Faster R-CNN. The method starts from a set of street view images, and the results are models with facade elements. A 3D point cloud can be extracted from the images using SfM and MVS, and the camera parameters can also be recovered. We take advantage of the high-quality facade images and parse the facades to detect their bounding boxes. The bounding boxes can match pretty well with the rectangular shape of the facade elements. The 2D facade elements can be added to the 3D building model based on the camera parameters. The process is very efficient and automatic. The regularity of the facade elements will be reserved, making the result more convincing. Our method includes four main steps: (1) coarse model reconstruction, (2) facade image selection and rectification, (3) facade element detection and regularization, and (4) detailed facade reconstruction.

Experiment results show that our method can produce reliable building models with facade details for many different situations. It can work for both the multi-face building blocks and the street side buildings. Our test shows that the window detection performance is pretty good. The object detection is extremely fast, and the whole pipeline is lightweight and efficient. In theory, the method can also be extended to reconstruct large-scale city models, which means it has broad application prospects.

Learning to Reconstruct Compact Building Models from Point Clouds

Master thesis (2021) - Zhaiyu Chen, Liangliang Nan, Seyran Khademi

Three-dimensional building models play a pivotal role in shaping the digital twin of our world. With the advance of sensing technologies, unprecedented data acquisition capabilities on capturing the built environment have surfaced, with photogrammetry and light detection and ranging being the two important sources, both of which can acquire point clouds of buildings. A point cloud is anisotropically distributed in space, which---though conveying spatial information itself---has to be converted into a surface model for a wider spectrum of usage. This conversion is often referred to as reconstruction. Despite the enhanced availability of point cloud data in the built environment, how to reconstruct high-quality building surface models remains non-trivial in remote sensing, computer vision, and computer graphics. Most reconstruction methods are dedicated to smooth surfaces represented by dense triangles, irrespective of the piecewise planarity that dominates the geometry of real-world buildings. Although some works claim the possibility of reconstructing piecewise-planar shapes from point clouds, they either struggle to comply with specific geometric constraints, or suffer from serious scalability issues. There is no versatile solution yet for building reconstruction. In this thesis, we propose a novel framework for reconstructing compact, watertight, polygonal building models from point clouds. Our approach comprises three functional blocks: (a) a cell complex is generated via adaptive space partitioning that provides a polyhedral embedding as the candidate set; (b) an implicit field is learnt by a deep neural network that facilitates building occupancy estimation; (c) a Markov random field is formulated for surface extraction via combinatorial optimisation, where an efficient graph-cut solver is applied. We extensively evaluate the proposed method in comparison with state-of-the-art methods in shape reconstruction, surface approximation and geometry simplification. Experimental results reveal that, with our neural-guided strategy, high-quality building models can be obtained with significant advantages over fidelity, compactness and computational efficiency. The method shows robustness to noise and insufficient measurements due to occlusions, and generalise reasonably well from synthetic scans to real-world measurements. Moreover, our method remains generic to not only buildings, but any piecewise-planar objects. ...

Three-dimensional building models play a pivotal role in shaping the digital twin of our world. With the advance of sensing technologies, unprecedented data acquisition capabilities on capturing the built environment have surfaced, with photogrammetry and light detection and ranging being the two important sources, both of which can acquire point clouds of buildings. A point cloud is anisotropically distributed in space, which---though conveying spatial information itself---has to be converted into a surface model for a wider spectrum of usage. This conversion is often referred to as reconstruction. Despite the enhanced availability of point cloud data in the built environment, how to reconstruct high-quality building surface models remains non-trivial in remote sensing, computer vision, and computer graphics. Most reconstruction methods are dedicated to smooth surfaces represented by dense triangles, irrespective of the piecewise planarity that dominates the geometry of real-world buildings. Although some works claim the possibility of reconstructing piecewise-planar shapes from point clouds, they either struggle to comply with specific geometric constraints, or suffer from serious scalability issues. There is no versatile solution yet for building reconstruction. In this thesis, we propose a novel framework for reconstructing compact, watertight, polygonal building models from point clouds. Our approach comprises three functional blocks: (a) a cell complex is generated via adaptive space partitioning that provides a polyhedral embedding as the candidate set; (b) an implicit field is learnt by a deep neural network that facilitates building occupancy estimation; (c) a Markov random field is formulated for surface extraction via combinatorial optimisation, where an efficient graph-cut solver is applied. We extensively evaluate the proposed method in comparison with state-of-the-art methods in shape reconstruction, surface approximation and geometry simplification. Experimental results reveal that, with our neural-guided strategy, high-quality building models can be obtained with significant advantages over fidelity, compactness and computational efficiency. The method shows robustness to noise and insufficient measurements due to occlusions, and generalise reasonably well from synthetic scans to real-world measurements. Moreover, our method remains generic to not only buildings, but any piecewise-planar objects.

Tree Reconstruction from a Point Cloud using an L-system

Student report (2021) - D.J. Dobson, H. Dong, N. van der Horst, L.M. Langhorst, J.A.J. van der Vaart, Z. Wu, L. Nan, S. Du, Dirk Voets

Storing accurate models of complex geometries in a compact way has become an increasingly challenging issue, especially when dealing with large datasets. One of such datasets is Cobra-Groeninzicht's database of all trees in the Netherlands. In the gaming industry, a new technique is being used to generate tree models: the L-system. An L-system stores a string representation of the structural model of a tree, with the added possibility for recursive modelling using growing rules. This format proves a promising alternative to more traditional methods of storing complex geometries. However, it remains unclear whether it can be an accurate enough representation for modelling and analysing real-life trees.

In this research project, the AdTree algorithm is used to reconstruct a skeleton from a point cloud of a single tree. This skeleton is then transformed to an L-System string format, as well as a CityJSON format (both in JSON structure). The L-system format comes with the advantage that it allows for several methods of increasing its compactness further (growing, generalisation). The overall size of these files also indicates fewer storage space is needed to store the tree geometry. The quality of the L-System skeleton is nearly equal to the input, the skeleton generated by. Assuming it can be read and drawn using a Turtle program, the L-system thus allows for storing the same geometric information more compactly than traditional storage formats, with sufficient accuracy, and the added possibilities of growing or generalising the model. ...

Outer surface extraction for complex 3D building models

Master thesis (2020) - Y. Zhao, L. Nan, H. Ledoux

In recent years, 3D building models have become increasingly widespread and are intensively exploited in fields of computer graphics and geometry processing. One of the common surface representations of 3D building models is polygon mesh, which is both compact and efficient in terms of exchange format
and data processing respectively. Downstream applications of polygon meshes can be found in the fields of urban planning, digital mapping, and fluid simulation.
However, the aforementioned applications usually require the input to be watertight and manifold, which is not always fulfilled by existing building models. Moreover, the interior structures of a building model are also considered redundant in certain applications. From such practical demands comes
our graduation project, i.e. trying to recover watertight and manifold outer surface from error-ridden 3D building models.
Existing methods with respect to outer surface extraction can be categorized into two types: surfaceoriented methods and volumetric methods. The former focuses on one particular type of artifacts and operates directly on the surfaces of the defective model. Since surface-oriented methods mainly introduce local operations where needed, unnecessary changes of the original model can be avoided and the result is of high fidelity. Whilst, the latter generates an intermediate representation of the original model, based on which the outer surface is extracted. The volumetric methods are more heuristic for our project since they are designed especially for multiple artifacts and the results are gauranteed with some desired properties.
In this thesis, we propose a hybrid approach for the extraction of outer surface from error-ridden 3D building models, which aims at recovering a watertight and manifold outer shell of the original model. The advantage of our method is that it is non-parametric, fully automatic, and have no assumptions for the input. Moreover, the small features of original model are kept to the greatest extent after processing.
Our method can be divided into four steps: 1) pre-processing, 2) constrained tetrahedralization, 3) classification, and 4) outer surface extraction. All six types of artifacts listed in this paper are gradually resolved during these steps, resulting in a watertight and manifold representation.
The results from our experiments turn out that our methodology can generate valid results in most cases, while preserving input faces and small features at the same time. Comparing with several stateof-the-art methods, our results still possess superior properties in terms of validity and integrity. ...

In recent years, 3D building models have become increasingly widespread and are intensively exploited in fields of computer graphics and geometry processing. One of the common surface representations of 3D building models is polygon mesh, which is both compact and efficient in terms of exchange format
and data processing respectively. Downstream applications of polygon meshes can be found in the fields of urban planning, digital mapping, and fluid simulation.
However, the aforementioned applications usually require the input to be watertight and manifold, which is not always fulfilled by existing building models. Moreover, the interior structures of a building model are also considered redundant in certain applications. From such practical demands comes
our graduation project, i.e. trying to recover watertight and manifold outer surface from error-ridden 3D building models.
Existing methods with respect to outer surface extraction can be categorized into two types: surfaceoriented methods and volumetric methods. The former focuses on one particular type of artifacts and operates directly on the surfaces of the defective model. Since surface-oriented methods mainly introduce local operations where needed, unnecessary changes of the original model can be avoided and the result is of high fidelity. Whilst, the latter generates an intermediate representation of the original model, based on which the outer surface is extracted. The volumetric methods are more heuristic for our project since they are designed especially for multiple artifacts and the results are gauranteed with some desired properties.
In this thesis, we propose a hybrid approach for the extraction of outer surface from error-ridden 3D building models, which aims at recovering a watertight and manifold outer shell of the original model. The advantage of our method is that it is non-parametric, fully automatic, and have no assumptions for the input. Moreover, the small features of original model are kept to the greatest extent after processing.
Our method can be divided into four steps: 1) pre-processing, 2) constrained tetrahedralization, 3) classification, and 4) outer surface extraction. All six types of artifacts listed in this paper are gradually resolved during these steps, resulting in a watertight and manifold representation.
The results from our experiments turn out that our methodology can generate valid results in most cases, while preserving input faces and small features at the same time. Comparing with several stateof-the-art methods, our results still possess superior properties in terms of validity and integrity.

Indoor 3D Reconstruction from a Single Image

Master thesis (2020) - Chirag Garg, Liangliang Nan, Jan van Gemert, Seyran Khademi

3D indoor reconstruction has been an important research area in the field of computer vision and photogrammetry. While the initial techniques developed for this purpose use sensor devices and multiple images for data acquisition and extracting 3D information and representation of the scene, with the advent of deep learning techniques, there has been good progress in extracting 3D information of an indoor scene reconstruction using a single image. This has potential in minimizing user efforts and cost for data acquisition. The current state-of-the-art method involves two main components, the global depth map and plane instances. After investigating the current state-of-the-art methods, it is observed that there is inconsistency in reconstructed surface boundaries and depth estimation over the curvature and edges of the objects present in the scene, despite having good 3D representation in the surrounding regions. We devise a loss function for optimizing depth estimation during supervision of the neural network by providing geometric awareness to the pixels at local level based on its neighborhood properties defined by spatial compatibility and color similarity. A similar function is used during 3D reconstruction for orientation consistency in the point cloud. Based on the quantitative and qualitative analysis, it is observed that the proposed approach helps in improving the 3D reconstruction from a single image in an indoor environment. ...