W. Gao | TU Delft Repository

Exploration of algorithms for extracting wireframe models from man-made urban linear object point clouds

Master thesis (2025) - H. Gan, H. Ledoux, W. Gao

This thesis addresses the challenge of extracting wireframe models which consist of 3d line segments from point clouds of man-made urban linear objects, with a specific focus on power lines and pylons. Wireframe models are essential for various applications including 3D city modeling, infrastructure monitoring, and urban planning. However, the automatic extraction of accurate wireframes from sparse airborne LiDAR point clouds remains challenging due to the complexity of these structures. Current wireframe extraction research either require high-quality data for model fitting or depend on complex pre-processing steps, lacking generality. To address these challenges, this thesis proposes a comprehensive evaluation of multiple wireframe extraction approaches and introduces an energy minimization framework which aims to address the limitations of the existing algorithms.

This thesis investigates four different algorithms for wireframe extraction: 3D RANSAC, 3D-2D RANSAC, Region Growing, and Hough Transform to address their limitations. Additionally, an approach of energy minimization for Markov Random Field is proposed to explore the potential of energy minimization methods in wireframe model extraction. Each algorithm is evaluated using a dataset of power lines and pylons from the Netherlands, with manually extracted wireframes serving as ground truth.

Experimental results demonstrate that each algorithm exhibits distinct advantages and limitations. The 3D RANSAC algorithm struggles with cylinder radius estimation and overlooks significant portions of input data. The 3D-2D RANSAC approach reduces dependency on normal estimation but still faces challenges with fitting accuracy. Region Growing achieves lower overlooking rates but suffers from scattered distribution of extracted elements. Hough Transform performs well on simple structures without requiring normal information but becomes computationally expensive for complex cases. The proposed energy minimization method shows promising results in preserving structural integrity by processing dense input graphs, particularly for complex structures with internal components.

Common limitations across all approaches include difficulties in normal estimation from sparse point clouds, misalignment between extracted primitives and ground truth, and challenges in balancing completeness and accuracy. The research emphasizes the complexity of wireframe extraction from point clouds and provides insights for developing more robust methods that combine the strengths of different approaches while addressing their mutual limitations. ...

This thesis addresses the challenge of extracting wireframe models which consist of 3d line segments from point clouds of man-made urban linear objects, with a specific focus on power lines and pylons. Wireframe models are essential for various applications including 3D city modeling, infrastructure monitoring, and urban planning. However, the automatic extraction of accurate wireframes from sparse airborne LiDAR point clouds remains challenging due to the complexity of these structures. Current wireframe extraction research either require high-quality data for model fitting or depend on complex pre-processing steps, lacking generality. To address these challenges, this thesis proposes a comprehensive evaluation of multiple wireframe extraction approaches and introduces an energy minimization framework which aims to address the limitations of the existing algorithms.

This thesis investigates four different algorithms for wireframe extraction: 3D RANSAC, 3D-2D RANSAC, Region Growing, and Hough Transform to address their limitations. Additionally, an approach of energy minimization for Markov Random Field is proposed to explore the potential of energy minimization methods in wireframe model extraction. Each algorithm is evaluated using a dataset of power lines and pylons from the Netherlands, with manually extracted wireframes serving as ground truth.

Experimental results demonstrate that each algorithm exhibits distinct advantages and limitations. The 3D RANSAC algorithm struggles with cylinder radius estimation and overlooks significant portions of input data. The 3D-2D RANSAC approach reduces dependency on normal estimation but still faces challenges with fitting accuracy. Region Growing achieves lower overlooking rates but suffers from scattered distribution of extracted elements. Hough Transform performs well on simple structures without requiring normal information but becomes computationally expensive for complex cases. The proposed energy minimization method shows promising results in preserving structural integrity by processing dense input graphs, particularly for complex structures with internal components.

Common limitations across all approaches include difficulties in normal estimation from sparse point clouds, misalignment between extracted primitives and ground truth, and challenges in balancing completeness and accuracy. The research emphasizes the complexity of wireframe extraction from point clouds and provides insights for developing more robust methods that combine the strengths of different approaches while addressing their mutual limitations.

Roof Structure Extraction from Remote Sensing Images

Master thesis (2025) - H.Y. Cheng, L. Nan, W. Gao, A. Rafiee

This thesis presents a method for extracting structured roof surfaces from remote sensing images. It achieved this by combining semantic segmentation with polygon-based refinement, which allows rooftop boundaries to be described more accurately using line and shape information. The method includes three main stages: (1) using an instance segmentation model to detect and classify rooftop areas; (2) generating polygonal candidates for plannar roof regions based on detected line features; and (3) optimizing label assignments through a Markov Random Field (MRF) model, which integrates prediction confidence with the spatial relationships between polygons. Experiments on benchmark datasets show that this approach improves the accuracy and consistency of rooftop segmentation while reducing incorrect detections. The system is modular and flexible, making it suitable for applications that require reliable roof structure analysis in urban environments. ...

Structure Guided Roof Heightmap Completion

Via Diffusion Model

Master thesis (2025) - X. Zhao, H. Ledoux, W. Gao, A. Rafiee, R.Y. Peters

Urban digital twins rely on accurate rooftop geometry, yet airborne lidar point clouds are frequently sparse and incomplete, leading to substantial information loss in building reconstruction. This thesis investigates diffusion--based learning as a remedy for high-fidelity roof recovery under severe data corruption.

This thesis proposes a two-stage framework that operates on 2.5D height-map representations. Stage~I introduces a dual-task diffusion model that jointly performs roof height-map completion and roof-line prediction. A novel Bidirectional Control Module enables reciprocal conditioning between the two tasks, enforcing geometric consistency during the denoising process. Stage~II employs a patch-based diffusion upsampler equipped with positional embeddings and a domain-specific global context encoder to synthesise high-resolution height maps while remaining computationally tractable for large and variably-sized buildings. A rigorous preprocessing pipeline further yields two challenging benchmarks, \textsc{S80\_i30} and \textsc{S80\_i80}, derived from 160k real-world building samples.

Extensive experiments conducted on these datasets demonstrate the effectiveness of the proposed approach. Under moderate corruption (\textsc{S80\_i30}), the completion model attains an \textit{RMSE} of \textbf{0.89}~m and a Chamfer distance of \textbf{0.06}, improving upon the state-of-the-art RoofDiffusion baseline by 13.2\% and 17.3\%, respectively. In the severe setting (\textsc{S80\_i80}), the method sustains a 13.5\% \textit{RMSE} reduction. The upsampling stage delivers an additional 10\% \textit{RMSE} gain over the best classical interpolator, and the end-to-end pipeline achieves \textit{RMSE} values of 0.91~m (moderate) and 1.42~m (severe).

The thesis contributes: (i) a structurally-aware diffusion framework for roof completion, (ii) a scalable patch-based upsampler, and (iii) public benchmarks that reflect real lidar degradation. Collectively, these advances close a critical gap between theoretical research and practical generation of LOD2.2 building models, facilitating more reliable urban analytics and planning applications.
...

Urban digital twins rely on accurate rooftop geometry, yet airborne lidar point clouds are frequently sparse and incomplete, leading to substantial information loss in building reconstruction. This thesis investigates diffusion--based learning as a remedy for high-fidelity roof recovery under severe data corruption.

This thesis proposes a two-stage framework that operates on 2.5D height-map representations. Stage~I introduces a dual-task diffusion model that jointly performs roof height-map completion and roof-line prediction. A novel Bidirectional Control Module enables reciprocal conditioning between the two tasks, enforcing geometric consistency during the denoising process. Stage~II employs a patch-based diffusion upsampler equipped with positional embeddings and a domain-specific global context encoder to synthesise high-resolution height maps while remaining computationally tractable for large and variably-sized buildings. A rigorous preprocessing pipeline further yields two challenging benchmarks, \textsc{S80\_i30} and \textsc{S80\_i80}, derived from 160k real-world building samples.

Extensive experiments conducted on these datasets demonstrate the effectiveness of the proposed approach. Under moderate corruption (\textsc{S80\_i30}), the completion model attains an \textit{RMSE} of \textbf{0.89}~m and a Chamfer distance of \textbf{0.06}, improving upon the state-of-the-art RoofDiffusion baseline by 13.2\% and 17.3\%, respectively. In the severe setting (\textsc{S80\_i80}), the method sustains a 13.5\% \textit{RMSE} reduction. The upsampling stage delivers an additional 10\% \textit{RMSE} gain over the best classical interpolator, and the end-to-end pipeline achieves \textit{RMSE} values of 0.91~m (moderate) and 1.42~m (severe).

The thesis contributes: (i) a structurally-aware diffusion framework for roof completion, (ii) a scalable patch-based upsampler, and (iii) public benchmarks that reflect real lidar degradation. Collectively, these advances close a critical gap between theoretical research and practical generation of LOD2.2 building models, facilitating more reliable urban analytics and planning applications.

Scenario-based energy simulation: Modelling tree planting strategy to reduce heating and cooling demand under 2050 climate conditions

Master thesis (2025) - A. Rahmawati, W. Gao, C.A. León Sánchez, G. Agugiaro

In recent years, rising energy demand and intensifying climate change impacts have placed urban energy systems under growing pressure. Higher average temperatures and more frequent heatwaves are projected to substantially increase cooling demand. UBEM offers a means to analyse such dynamics at the district scale; however, vegetation effects on building energy use remain under-represented. Existing approaches often rely on multiple coupled models, apply to small spatial extents, or omit future climate scenarios, thereby limiting their usefulness for urban planning and climate adaptation strategies.
In this thesis, we introduce a neighbourhood-scale workflow that integrates a tree planting scenario into a single simulation-based UBEM platform. The main characteristic of the method lies in its use of standardised CityGML building models, simplified yet seasonally dynamic vegetation representations, and a unified modelling environment that allows consistent comparison of a planting strategy under both current and projected 2050 climate conditions. Six scenarios were applied to two contrasting Rotterdam neighbourhoods to quantify heating and cooling demand at building and neighbourhood levels while separating climate-driven changes from vegetation impacts.
Results indicate that, between 2023 and 2050, cooling demand increases by 32–39%, while heating demand decreases by approximately 12%. Adding deciduous trees reduces neighbourhood cooling demand by 3–10%, depending on location and climate scenario, but winter shading introduces heating penalties of up to 2%, leading to small net annual changes at the neighbourhood scale (0.9 to 0.4%). Building-level effects are more heterogeneous: in compact districts, additional trees sometimes block limited winter solar gains, while in open areas with high cooling exposure, they consistently reduce peak summer loads. Orientation and facade exposure emerge as key factors shaping the balance between summer benefits and winter penalties.
The workflow produces spatially explicit maps and scenario comparisons to support an energy-aware, location-specific planting strategy. However, simplified tree geometry, static building stock assumptions, monthly climate inputs, and computational limits constrain the accuracy and scalability of the results. Future research should integrate hourly climate data, species specific vegetation models, dynamic retrofitting scenarios, and larger spatial domains to better capture seasonal variability, urban morphological diversity, and the inter actions between greening and energy system decarbonisation pathways. ...

In recent years, rising energy demand and intensifying climate change impacts have placed urban energy systems under growing pressure. Higher average temperatures and more frequent heatwaves are projected to substantially increase cooling demand. UBEM offers a means to analyse such dynamics at the district scale; however, vegetation effects on building energy use remain under-represented. Existing approaches often rely on multiple coupled models, apply to small spatial extents, or omit future climate scenarios, thereby limiting their usefulness for urban planning and climate adaptation strategies.
In this thesis, we introduce a neighbourhood-scale workflow that integrates a tree planting scenario into a single simulation-based UBEM platform. The main characteristic of the method lies in its use of standardised CityGML building models, simplified yet seasonally dynamic vegetation representations, and a unified modelling environment that allows consistent comparison of a planting strategy under both current and projected 2050 climate conditions. Six scenarios were applied to two contrasting Rotterdam neighbourhoods to quantify heating and cooling demand at building and neighbourhood levels while separating climate-driven changes from vegetation impacts.
Results indicate that, between 2023 and 2050, cooling demand increases by 32–39%, while heating demand decreases by approximately 12%. Adding deciduous trees reduces neighbourhood cooling demand by 3–10%, depending on location and climate scenario, but winter shading introduces heating penalties of up to 2%, leading to small net annual changes at the neighbourhood scale (0.9 to 0.4%). Building-level effects are more heterogeneous: in compact districts, additional trees sometimes block limited winter solar gains, while in open areas with high cooling exposure, they consistently reduce peak summer loads. Orientation and facade exposure emerge as key factors shaping the balance between summer benefits and winter penalties.
The workflow produces spatially explicit maps and scenario comparisons to support an energy-aware, location-specific planting strategy. However, simplified tree geometry, static building stock assumptions, monthly climate inputs, and computational limits constrain the accuracy and scalability of the results. Future research should integrate hourly climate data, species specific vegetation models, dynamic retrofitting scenarios, and larger spatial domains to better capture seasonal variability, urban morphological diversity, and the inter actions between greening and energy system decarbonisation pathways.

Automated Semantic Segmentation of Aerial Imagery using Synthetic Data

Master thesis (2022) - C.A. Caceres Tocora, S. Du, J.E. Stoter, Sven Briels, W. Gao

Semantic segmentation of aerial images is the ability to assign labels to all pixels of an image. It proves to be essential for various applications such as urban planning, agriculture and real-estate analysis. Deep Learning techniques have shown satisfactory results in performing semantic segmentation tasks. Training a deep learning model is an expensive operation, while most of the time manually labelled images are required. Additionally, a bottleneck in semantic segmentation projects concerns the annotation of images. Consequently, synthetic data, which consists of images from a virtual world that simulates the real world, can be used as training data for segmentation tasks to improve the classification results. Therefore, this thesis aims to create a pipeline that generates synthetic images with semantic segmentation labels to be used in an existing deep learning model and discuss how the generated synthetic data improves the semantic segmentation of aerial images. In this research work, an existing model (FuseNet), which in previous works achieved satisfactory results, is trained with solely synthetic data and a mix of real data in different training and testing scenarios to classify true ortho imagery from Haaksbergen, Netherlands and Potsdam, Germany. In addition, a benchmark of domain adaptation techniques is performed to close the domain gap between the synthetic and real imagery. The semantic maps include building, road and other classes. Experiments are performed to test the performance of the synthetic data using 1) Different 3D models of the virtual world, 2) Different quantities of synthetic and real training data, 3) Different cross-geographical scenarios, and 4) Different domain adaptation techniques. The assessment is based on the (mean) intersection over union (IoU), F1 score, precision and recall and an extensive visual assessment. The virtual world is created through a pipeline in CityEngine using procedural modelling techniques and then rendered in Blender to create the training dataset. The results show that the synthetic data has a mIoU of 0.48, which is lower compared to cases when solely real data (0.75) are used, when the segmentation is performed in the same training and testing area. In addition, the 3D models partly affect the segmentation results. When using a mix of real and synthetic data, the results are maintained to a mIoU of 0.75. On the contrary, when training and testing in different areas, the use of synthetic data seems to improve the results on average by 21.5, 12.5, 1.5 and 2 percentage points on the mIoU, IoU for classes building, road and other respectively. Additionally, domain adaptation techniques such as Cycle GAN and Cycada improve the performance of synthetic datasets by 4 percentage points. Overall, this thesis shows that when the domain difference between the training and testing datasets is big, the addition of the synthetic data helps to improve the performance of the semantic segmentation of aerial images. Synthetic datasets improve the segmentation results by using a mix of existing labelled imagery from different geographical regions when a project lacks labelled imagery. In contrast, when labelled imagery is present in the same testing area, the real training data obtains robust results, thus the addition of synthetic data does not improve the segmentation results. ...

Semantic segmentation of aerial images is the ability to assign labels to all pixels of an image. It proves to be essential for various applications such as urban planning, agriculture and real-estate analysis. Deep Learning techniques have shown satisfactory results in performing semantic segmentation tasks. Training a deep learning model is an expensive operation, while most of the time manually labelled images are required. Additionally, a bottleneck in semantic segmentation projects concerns the annotation of images. Consequently, synthetic data, which consists of images from a virtual world that simulates the real world, can be used as training data for segmentation tasks to improve the classification results. Therefore, this thesis aims to create a pipeline that generates synthetic images with semantic segmentation labels to be used in an existing deep learning model and discuss how the generated synthetic data improves the semantic segmentation of aerial images. In this research work, an existing model (FuseNet), which in previous works achieved satisfactory results, is trained with solely synthetic data and a mix of real data in different training and testing scenarios to classify true ortho imagery from Haaksbergen, Netherlands and Potsdam, Germany. In addition, a benchmark of domain adaptation techniques is performed to close the domain gap between the synthetic and real imagery. The semantic maps include building, road and other classes. Experiments are performed to test the performance of the synthetic data using 1) Different 3D models of the virtual world, 2) Different quantities of synthetic and real training data, 3) Different cross-geographical scenarios, and 4) Different domain adaptation techniques. The assessment is based on the (mean) intersection over union (IoU), F1 score, precision and recall and an extensive visual assessment. The virtual world is created through a pipeline in CityEngine using procedural modelling techniques and then rendered in Blender to create the training dataset. The results show that the synthetic data has a mIoU of 0.48, which is lower compared to cases when solely real data (0.75) are used, when the segmentation is performed in the same training and testing area. In addition, the 3D models partly affect the segmentation results. When using a mix of real and synthetic data, the results are maintained to a mIoU of 0.75. On the contrary, when training and testing in different areas, the use of synthetic data seems to improve the results on average by 21.5, 12.5, 1.5 and 2 percentage points on the mIoU, IoU for classes building, road and other respectively. Additionally, domain adaptation techniques such as Cycle GAN and Cycada improve the performance of synthetic datasets by 4 percentage points. Overall, this thesis shows that when the domain difference between the training and testing datasets is big, the addition of the synthetic data helps to improve the performance of the semantic segmentation of aerial images. Synthetic datasets improve the segmentation results by using a mix of existing labelled imagery from different geographical regions when a project lacks labelled imagery. In contrast, when labelled imagery is present in the same testing area, the real training data obtains robust results, thus the addition of synthetic data does not improve the segmentation results.

Semantic segmentation of point clouds with the 3D medial axis transform

Master thesis (2020) - G. Ceccarelli, R.Y. Peters, W. Gao

A point cloud is a representation of shapes, organized in a 3D irregular structure. Point clouds are increasingly used in different applications, ranging from architectural preservation to computer vision. The 3D medial axis transform is a topology preserving, skeleton representation of shapes. It can be used to decompose an object in meaningful parts and to describe local and long range information of points in a point cloud.

In the past years, many deep learning methods for point clouds emerged. These are used for different applications, such as shape classification, object detection or semantic segmentation. In particular, the latter aim to classify each point in the input point cloud in subsets, based on their semantics.

This research investigates the integration of the 3D MAT in two deep learning methods for point clouds' semantic segmentation, PointNet++ and Superpoint Graph. In particular, the 3D MAT was used in PointNet++ as a point feature, to give context to local points. Then, it was used in Superpoint Graph as a geometric descriptor to partition a point cloud and as a edge feature in the SPG.

The major findings of this research outline that the 3D MAT can be successfully used in PointNet++ as a point feature, improving the overall accuracy and loss values of the algorithm. Particularly two MAT derived properties used in this research output positive results, radii and separation angles. These can be combined with point coordinates and RGB information to bring additional knowledge on the geometry of the shape, representing its curvature and thickness. Furthermore, they can be integrated in a simple and effective way, without increasing computational or time effort in the algorithm.

The analysis carried out in Superpoint Graph depicts that the 3D MAT does not improve the initial geometric partition. In fact, adding geometric descriptors to the algorithm increases the difficulty in dividing the point cloud into simple shapes, creating artifacts. Furthermore, adding MAT information on superedges does not give added value to the SPG graph. The reason is that the SPG graph and the structured MAT are different than each other, in practice, as nodes represent diverse parts in the point cloud. ...

A point cloud is a representation of shapes, organized in a 3D irregular structure. Point clouds are increasingly used in different applications, ranging from architectural preservation to computer vision. The 3D medial axis transform is a topology preserving, skeleton representation of shapes. It can be used to decompose an object in meaningful parts and to describe local and long range information of points in a point cloud.

In the past years, many deep learning methods for point clouds emerged. These are used for different applications, such as shape classification, object detection or semantic segmentation. In particular, the latter aim to classify each point in the input point cloud in subsets, based on their semantics.

This research investigates the integration of the 3D MAT in two deep learning methods for point clouds' semantic segmentation, PointNet++ and Superpoint Graph. In particular, the 3D MAT was used in PointNet++ as a point feature, to give context to local points. Then, it was used in Superpoint Graph as a geometric descriptor to partition a point cloud and as a edge feature in the SPG.

The major findings of this research outline that the 3D MAT can be successfully used in PointNet++ as a point feature, improving the overall accuracy and loss values of the algorithm. Particularly two MAT derived properties used in this research output positive results, radii and separation angles. These can be combined with point coordinates and RGB information to bring additional knowledge on the geometry of the shape, representing its curvature and thickness. Furthermore, they can be integrated in a simple and effective way, without increasing computational or time effort in the algorithm.

The analysis carried out in Superpoint Graph depicts that the 3D MAT does not improve the initial geometric partition. In fact, adding geometric descriptors to the algorithm increases the difficulty in dividing the point cloud into simple shapes, creating artifacts. Furthermore, adding MAT information on superedges does not give added value to the SPG graph. The reason is that the SPG graph and the structured MAT are different than each other, in practice, as nodes represent diverse parts in the point cloud.