WG

W. Gao

info

Please Note

14 records found

Review (2026) - Jesús Balado, Yu Feng, Zhouyan Qiu, W. Gao, Arttu Julin
Accurate integration and navigation of real-world 3D spaces are fundamental for next-generation Extended Reality (XR) systems, enhancing immersion, utility, and fidelity. This paper systematically reviews XR workflows using PRISMA guidelines, focusing on 3D data acquisition, modeling, visualization, and user interaction, based on 96 journal publications. Data collection for XR relies on photogrammetry, RGB-D cameras, and LiDAR, often enhanced by multi-sensor fusion, although real-time transmission and semantic alignment remain challenging. XR pipelines are dominated by Building Information Modeling (BIM) software and game engines, frequently integrating Computer-Aided Design (CAD) models and 3D scanned data. Visualization varies from photorealistic renderings to schematic representations, with Virtual Reality headsets favored for training and Augmented Reality devices applied in inspection and navigation. Interaction paradigms encompass controllers, gestures, gaze, voice, and haptics, with increasing reliance on Artificial Intelligence for multimodal fusion and processing. Despite progress, key challenges persist, including bandwidth limitations, manual 3D modeling, hybrid data management, interoperability issues, and scarcity of open-source solutions. Additional identified barriers involve balancing visual quality with performance in specific contexts, limited accuracy of non-invasive Brain-Computer Interfaces, and restricted market acceptance due to high costs. Overall, XR adoption remains constrained by technical, usability, and accessibility gaps. ...
Journal article (2025) - Y. Xia, W. Gao, J. Stoter
High-precision 3D urban applications — including emergency response simulation, microclimate analysis, and heritage conservation— demand semantically enriched 3D building representations at Level of Detail 3 (LoD3) with parametric façade components. Current urban digital twins predominantly rely on LoD2 models (as exemplified by the nationwide 3D BAG dataset in the Netherlands) that lack critical architectural features such as windows and doors, constraining their analytical value and their utility for fine-grained applications. This study introduces a novel pipeline to bridge this gap, enabling the enrichment of LoD2 models with accurate opening information using aerial oblique imagery and deep learning. The approach addresses critical challenges in 3D-2D alignment by leveraging perspective projection for comprehensive façade extraction, least-squares registration to rectify systematic offsets, and Mask R-CNN for robust opening detection. Unlike conventional methods, it captures both inward and outward building faces by projecting all 3D façades onto multi-directional images, ensuring complete coverage of visible elements. Geometric scaling integrates detected openings into LoD2 models as watertight, semantically rich components, validated for structural consistency. By overcoming data misalignments and occlusion limitations, this methodology provides a scalable framework for large-scale LoD3 generation, enabling efficient upgrades of existing building models to support detailed spatial analysis in smart city contexts. [...] ...

A multimodal semantic segmentation dataset for roofing material classification

Journal article (2025) - Dimitris Mantas, Weixiao Gao, Hugo Ledoux
Roofing material classification is critical for urban sustainability, energy efficiency, public health, environmental protection, and regulatory compliance. Despite the need for scalable solutions, existing approaches are hindered by reliance on oftentimes expensive and rare multi-or hyper-spectral satellite imagery, application-specific assumptions and biases, and oversight of deep learning and multimodal data fusion. This paper addresses these gaps by introducing RoofSense, a multimodal semantic segmentation dataset for roofing material classification in diverse urban contexts, leveraging 8 cm aerial true-color imagery and airborne laser scanning data. Representing eight classes and encompassing over 138 ha and 480 buildings across five Dutch cities, RoofSense is the largest publicly available dataset of its kind. By combining spectral and geometric information at the pixel level and adopting a novel weighting scheme to address class imbalance, RoofSense can be used to achieve competitive classification and segmentation performance in downstream tasks. This was demonstrated in a comprehensive purpose-designed benchmarking experiment with an off-The-shelf model based on ResNet-18-D and DeepLabv3+. Although lidar-derived features improved performance in difficult classes and materials commonly used on pitched roofs, results were sensitive to material and building context, clutter, and modality alignment, indicating that the theoretical benefits of data fusion are not straightforward. The implementation is publicly accessible at <code>https://github.com/DimitrisMantas/RoofSense</code>. ...
Journal article (2025) - Giulia Ceccarelli, Weixiao Gao, Ravi Peters
Semantic segmentation of 3D point clouds is pivotal for urban modeling and autonomous systems, yet challenges like irregular data structure and complex geometry hinder accurate segmentation. This study explores integrating the 3D Medial Axis Transform (MAT)—a topological skeleton encoding shape geometry via maximally inscribed balls—into deep learning frameworks to enhance semantic reasoning. We propose a feature fusion approach embedding MAT-derived attributes (radii, separation angles, medial bisectors) into point-based (PointNet++) and graph-based (Superpoint Graph) networks, enabling explicit geometric context for local points and superpoint relationships. Experiments on diverse datasets (3DOM, SynthCity, SHREC) demonstrate that MAT-enhanced features, particularly radii and separation angles, improve mean intersection over union (mIoU) by 5.8–12.4% compared to baseline RGB-only models, especially for classes like grass and shrubs where appearance features are ambiguous. However, MAT-guided geometric partitioning requires careful regularization to avoid over-segmentation, and graph convolutions benefit most from mean MAT attributes for global structure modeling. This work establishes MAT as a valuable geometric prior for point cloud segmentation, highlighting its potential to bridge topological structure and data-driven learning. ...
Conference paper (2025) - E. Gebetsroither-Geringer, R. Padsala, A. Hainoun, G. Agugiaro, S. Biernat, A. Reber, B. Smetschka, W. Gao, D. Horak, More authors...
The urban socio-ecological transformation requires pathways for an urban energy transition, including the establishment of Positive Energy Districts (PEDs). Technical solutions and simulation tools for urban energy systems are needed for the planning, management and implementation of PEDs. In addition, the involvement of all societal stakeholders is needed to achieve the EU's ambitious target of 100 PEDs by 2025. To this end, innovative research, information and communication strategies must be developed.The transnational funded research project DigiTwins4PEDs focuses on developing an Urban Digital Twin as a dynamic digital representation of urban energy systems using advanced modelling tools. The framework facilitates the integrated energy demand-supply analysis at district scale. It enables the construction and analysis of future development scenarios to simulate the performance of PEDs.It supportsinformed decision-making by citizens and urban administration for a sustainable urban energy transition. The transnational project applies innovative methods and develops implementation strategies supported by a participatory process involving key stakeholders and citizens in co-design, co-creation and co-learning stages of research. Through the framework of living labs in four different case studies, citizens are continuously engaged throughout the project so that citizen-driven actions towards Positive Energy Districts can be considered and implemented more efficiently. New tools and methods are developed and adapted using Urban Digital Twins based on the CityGML data format to enhance public participation in advancing clean energy transition. These toolsenable citizens to actively engage in shaping the future energy transition of their communities and thus supporting informed decision-making.The developed and implemented urban digital twin framework is tested in different urban case study areas (Vienna, Stuttgart, Rotterdam, Wroclaw)within an innovative public participatory process to address the multifaceted aspects crucial for establishing PEDs together with the citizens. This paper discusses the concept and first prototype of the developedparticipatory planning framework, a shared urban data and modelling scheme,utilizing a digital twin, as well as itsimplementation.It will show how the developed frameworkenables the simulation of urban energy systems with integrated local socio-economic and demographic parameters to identify and visualise current and future energy demand, renewable potential and different energy flexibility strategies in a district. It will discuss how the developed framework can be integrated/combined with other citizens’ engagement tools, focussing on the ones used in the case study sites. Furthermore, we draw conclusions on how this framework can be used to support co-design, co-creation and co-learning of community-driven solutions for energy transformation. ...
Doctoral thesis (2024) - W. Gao, H. Ledoux, L. Nan
The thesis explores the semantic understanding of urban textured meshes derived from photogrammetric methods. It primarily addresses three aspects with regard to urban textured meshes: 1) semantic annotation and the creation of benchmark datasets, 2) semantic segmentation, and 3) the automation of lightweight 3D city modeling using semantic information.

The first focus of the thesis is the development of a benchmark dataset to evaluate the performance of advanced 3D semantic segmentation methods in urban settings. An interactive 3D annotation framework has been proposed to assign ground truth labels to the urban meshes' triangle faces and texture pixels. This framework achieves efficient and accurate semi-automatic annotation through segment classification and structure-aware interactive selection. In the center of Helsinki, Finland, object-level annotations were made over approximately 4 km\(^2\) (including buildings, vegetation, and vehicles, etc.), and part-level annotations over about 2.5 km\(^2\) (including building parts like doors, windows, and road markings, etc.). The design of the annotation tools improves user operation and enables quick annotation of large scenes, while the resulting datasets allow researchers to refine their deep learning models for urban analysis.

Another research focus is on mesh segmentation algorithms. A novel semantic mesh segmentation algorithm has been introduced for large-scale urban environments, employing plane-sensitive over-segmentation combined with graph-based methods for contextual data integration. This approach, which utilizes graph convolutional networks for classification, significantly improves performance over traditional techniques based on our proposed benchmark datasets.

Finally, leveraging this semantic information, a pipeline for reconstructing lightweight 3D city models has been designed. This facilitates the automated reconstruction of CityGML-based LoD2 and LoD3 city models, ensuring high fidelity in geometric detail and semantic accuracy. The reconstructed large-scale, lightweight, and semantic city models significantly broaden applications in urban spatial intelligence, including automatic geometric measurements, interactive spatial computations, spatial analysis based on external data, and environment simulation using physical engines.

This thesis enhances the practicality of 3D data in real-world applications by utilizing semantic parsing of urban textured meshes to generate lightweight 3D urban semantic models, greatly enriching their usability. It also lays a solid foundation for future progress in understanding, modeling, and analyzing 3D urban scenes. ...
Conference paper (2024) - Weixiao Gao, Ravi Peters, Jantien Stoter
This paper discusses the reconstruction of LoD2 building models from 2D and 3D data for large-scale urban environments. Traditional methods involve the use of LiDAR point clouds, but due to high costs and long intervals associated with acquiring such data for rapidly developing areas, researchers have started exploring the use of point clouds generated from (oblique) aerial images. However, using such point clouds for traditional plane detection-based methods can result in significant errors and introduce noise into the reconstructed building models. To address this, this paper presents a method for extracting rooflines from true orthophotos using line detection for the reconstruction of building models at the LoD2 level. The approach is able to extract relatively complete rooflines without the need for pre-labeled training data or pre-trained models. These lines can directly be used in the LoD2 building model reconstruction process. The method is superior to existing plane detection-based methods and state-of-the-art deep learning methods in terms of the accuracy and completeness of the reconstructed building. Our source code is available at https://github.com/tudelft3d/Roofline-extraction-from-orthophotos. ...

Building Point Cloud Completion Benchmarks

Journal article (2024) - Weixiao Gao, Ravi Peters, Jantien Stoter
With the rapid advancement of 3D sensing technologies, obtaining 3D shape information of objects has become increasingly convenient. Lidar technology, with its capability to accurately capture the 3D information of objects at long distances, has been widely applied in the collection of 3D data in urban scenes. However, the collected point cloud data often exhibit incompleteness due to factors such as occlusion, signal absorption, and specular reflection. This paper explores the application of point cloud completion technologies in processing these incomplete data and establishes a new real-world benchmark Building-PCC dataset, to evaluate the performance of existing deep learning methods in the task of urban building point cloud completion. Through a comprehensive evaluation of different methods, we analyze the key challenges faced in building point cloud completion, aiming to promote innovation in the field of 3D geoinformation applications. Our source code is available at https://github.com/ tudelft3d/Building-PCC-Building-Point-Cloud-Completion-Benchmarks.git ...
Journal article (2024) - Weixiao Gao, Ravi Peters, Hugo Ledoux, Jantien Stoter
This paper presents a new algorithm for filling holes in Level of Detail 2 (LoD2) building mesh models, addressing the challenges posed by geometric inaccuracies and topological errors. Unlike traditional methods that often alter the original geometric structure or impose stringent input requirements, our approach preserves the integrity of the original model while effectively managing a range of topological errors. The algorithm operates in three distinct phases: (1) pre-processing, which addresses topological errors and identifies pseudo-holes; (2) detecting and extracting complete border rings of holes; and (3) remeshing, aimed at reconstructing the complete geometric surface. Our method demonstrates superior performance compared to related work in filling holes in building mesh models, achieving both uniform local geometry around the holes and structural completeness. Comparative experiments with established methods demonstrate our algorithm’s effectiveness in delivering more complete and geometrically consistent hole-filling results, albeit with a slight trade-off in efficiency. The paper also identifies challenges in handling certain complex scenarios and outlines future directions for research, including the pursuit of a comprehensive repair goal for LoD2 models to achieve watertight 2-manifold models with correctly oriented normals. Our source code is available at https://github.com/tudelft3d/Automatic-Repair-of-LoD2-Building-Models.git ...

Planarity-sensible Semantic Segmentation of large-scale urban meshes

Journal article (2023) - Weixiao GAO, Liangliang Nan, Bas Boom, Hugo Ledoux
We introduce a novel deep learning-based framework to interpret 3D urban scenes represented as textured meshes. Based on the observation that object boundaries typically align with the boundaries of planar regions, our framework achieves semantic segmentation in two steps: planarity-sensible over-segmentation followed by semantic classification. The over-segmentation step generates an initial set of mesh segments that capture the planar and non-planar regions of urban scenes. In the subsequent classification step, we construct a graph that encodes the geometric and photometric features of the segments in its nodes and the multi-scale contextual features in its edges. The final semantic segmentation is obtained by classifying the segments using a graph convolutional network. Experiments and comparisons on two semantic urban mesh benchmarks demonstrate that our approach outperforms the state-of-the-art methods in terms of boundary quality, mean IoU (intersection over union), and generalization ability. We also introduce several new metrics for evaluating mesh over-segmentation methods dedicated to semantic segmentation, and our proposed over-segmentation approach outperforms state-of-the-art methods on all metrics. Our source code is available at https://github.com/WeixiaoGao/PSSNet. ...
Journal article (2023) - R.Y. Peters, B. Dukai, W. Gao, J.E. Stoter
De 3D BAG bevat automatisch gereconstrueerde LoD2-modellen van alle panden in Nederland, en is voor het eerst gereconstrueerd in het voorjaar van 2021 op basis van AHN3.1 Op basis van AHN4 is een nieuwe versie van de 3D BAG gereconstrueerd, in een samenwerking tussen 3DGI en de onderzoeksgroep 3D Geoinformation (TU Delft). AHN4 is niet alleen van hogere actualiteit, maar heeft ook andere kenmerken dan AHN3. Voor de geactualiseerde versie van 3D BAG hebben we daarom onderzocht hoe beide datasets optimaal gebruikt kunnen worden. ...
Voor de berekening van omgevingsgeluid geproduceerd door weg- en railverkeer en industrie maakt een geluidexpert gebruik van een 3D-model van de omgeving. Dit 3D model bevat onder andere informatie over a) de terreinhoogte, b) gebouwen en c) geluidreflecterende/absorberende eigenschappen van de bodem. Sinds vorig jaar kunnen geluidsexperts gebruik maken van een automatisch gegenereerd en landsdekkend 3D Omgevingsmodel Geluid. Deze dataset is ontwikkeld in een samenwerking van RIVM, Kadaster en de 3D Geoinformation onderzoeksgroep van de TU Delft in opdracht van het ministerie van Infrastructuur en Waterstaat. [...] ...

A benchmark dataset of Semantic Urban Meshes

Journal article (2021) - Weixiao Gao, Liangliang Nan, Bas Boom, Hugo Ledoux
Recent developments in data acquisition technology allow us to collect 3D texture meshes quickly. Those can help us understand and analyse the urban environment, and as a consequence are useful for several applications like spatial analysis and urban planning. Semantic segmentation of texture meshes through deep learning methods can enhance this understanding, but it requires a lot of labelled data. The contributions of this work are three-fold: (1) a new benchmark dataset of semantic urban meshes, (2) a novel semi-automatic annotation framework, and (3) an annotation tool for 3D meshes. In particular, our dataset covers about 4 km2 in Helsinki (Finland), with six classes, and we estimate that we save about 600 h of labelling work using our annotation framework, which includes initial segmentation and interactive refinement. We also compare the performance of several state-of-the-art 3D semantic segmentation methods on the new benchmark dataset. Other researchers can use our results to train their networks: the dataset is publicly available, and the annotation tool is released as open-source. ...
Journal article (2018) - Y. Chen, W. Gao, E. Widyaningrum, M. Zheng, Kaixuan Zhou
Semantic segmentation, especially for buildings, from the very high resolution (VHR) airborne images is an important task in urban mapping applications. Nowadays, the deep learning has significantly improved and applied in computer vision applications. Fully Convolutional Networks (FCN) is one of the tops voted method due to their good performance and high computational efficiency. However, the state-of-art results of deep nets depend on the training on large-scale benchmark datasets. Unfortunately, the benchmarks of VHR images are limited and have less generalization capability to another area of interest. As existing high precision base maps are easily available and objects are not changed dramatically in an urban area, the map information can be used to label images for training samples. Apart from object changes between maps and images due to time differences, the maps often cannot perfectly match with images. In this study, the main mislabeling sources are considered and addressed by utilizing stereo images, such as relief displacement, different representation between the base map and the image, and occlusion areas in the image. These free training samples are then fed to a pre-trained FCN. To find the better result, we applied fine-tuning with different learning rates and freezing different layers. We further improved the results by introducing atrous convolution. By using free training samples, we achieve a promising building classification with 85.6% overall accuracy and 83.77% F1 score, while the result from ISPRS benchmark by using manual labels has 92.02% overall accuracy and 84.06% F1 score, due to the building complexities in our study area. ...