W. Gao
Please Note
14 records found
1
RoofSense
A multimodal semantic segmentation dataset for roofing material classification
The first focus of the thesis is the development of a benchmark dataset to evaluate the performance of advanced 3D semantic segmentation methods in urban settings. An interactive 3D annotation framework has been proposed to assign ground truth labels to the urban meshes' triangle faces and texture pixels. This framework achieves efficient and accurate semi-automatic annotation through segment classification and structure-aware interactive selection. In the center of Helsinki, Finland, object-level annotations were made over approximately 4 km\(^2\) (including buildings, vegetation, and vehicles, etc.), and part-level annotations over about 2.5 km\(^2\) (including building parts like doors, windows, and road markings, etc.). The design of the annotation tools improves user operation and enables quick annotation of large scenes, while the resulting datasets allow researchers to refine their deep learning models for urban analysis.
Another research focus is on mesh segmentation algorithms. A novel semantic mesh segmentation algorithm has been introduced for large-scale urban environments, employing plane-sensitive over-segmentation combined with graph-based methods for contextual data integration. This approach, which utilizes graph convolutional networks for classification, significantly improves performance over traditional techniques based on our proposed benchmark datasets.
Finally, leveraging this semantic information, a pipeline for reconstructing lightweight 3D city models has been designed. This facilitates the automated reconstruction of CityGML-based LoD2 and LoD3 city models, ensuring high fidelity in geometric detail and semantic accuracy. The reconstructed large-scale, lightweight, and semantic city models significantly broaden applications in urban spatial intelligence, including automatic geometric measurements, interactive spatial computations, spatial analysis based on external data, and environment simulation using physical engines.
This thesis enhances the practicality of 3D data in real-world applications by utilizing semantic parsing of urban textured meshes to generate lightweight 3D urban semantic models, greatly enriching their usability. It also lays a solid foundation for future progress in understanding, modeling, and analyzing 3D urban scenes. ...
The first focus of the thesis is the development of a benchmark dataset to evaluate the performance of advanced 3D semantic segmentation methods in urban settings. An interactive 3D annotation framework has been proposed to assign ground truth labels to the urban meshes' triangle faces and texture pixels. This framework achieves efficient and accurate semi-automatic annotation through segment classification and structure-aware interactive selection. In the center of Helsinki, Finland, object-level annotations were made over approximately 4 km\(^2\) (including buildings, vegetation, and vehicles, etc.), and part-level annotations over about 2.5 km\(^2\) (including building parts like doors, windows, and road markings, etc.). The design of the annotation tools improves user operation and enables quick annotation of large scenes, while the resulting datasets allow researchers to refine their deep learning models for urban analysis.
Another research focus is on mesh segmentation algorithms. A novel semantic mesh segmentation algorithm has been introduced for large-scale urban environments, employing plane-sensitive over-segmentation combined with graph-based methods for contextual data integration. This approach, which utilizes graph convolutional networks for classification, significantly improves performance over traditional techniques based on our proposed benchmark datasets.
Finally, leveraging this semantic information, a pipeline for reconstructing lightweight 3D city models has been designed. This facilitates the automated reconstruction of CityGML-based LoD2 and LoD3 city models, ensuring high fidelity in geometric detail and semantic accuracy. The reconstructed large-scale, lightweight, and semantic city models significantly broaden applications in urban spatial intelligence, including automatic geometric measurements, interactive spatial computations, spatial analysis based on external data, and environment simulation using physical engines.
This thesis enhances the practicality of 3D data in real-world applications by utilizing semantic parsing of urban textured meshes to generate lightweight 3D urban semantic models, greatly enriching their usability. It also lays a solid foundation for future progress in understanding, modeling, and analyzing 3D urban scenes.
Building-PCC
Building Point Cloud Completion Benchmarks
PSSNet
Planarity-sensible Semantic Segmentation of large-scale urban meshes
We introduce a novel deep learning-based framework to interpret 3D urban scenes represented as textured meshes. Based on the observation that object boundaries typically align with the boundaries of planar regions, our framework achieves semantic segmentation in two steps: planarity-sensible over-segmentation followed by semantic classification. The over-segmentation step generates an initial set of mesh segments that capture the planar and non-planar regions of urban scenes. In the subsequent classification step, we construct a graph that encodes the geometric and photometric features of the segments in its nodes and the multi-scale contextual features in its edges. The final semantic segmentation is obtained by classifying the segments using a graph convolutional network. Experiments and comparisons on two semantic urban mesh benchmarks demonstrate that our approach outperforms the state-of-the-art methods in terms of boundary quality, mean IoU (intersection over union), and generalization ability. We also introduce several new metrics for evaluating mesh over-segmentation methods dedicated to semantic segmentation, and our proposed over-segmentation approach outperforms state-of-the-art methods on all metrics. Our source code is available at https://github.com/WeixiaoGao/PSSNet.
SUM
A benchmark dataset of Semantic Urban Meshes
Recent developments in data acquisition technology allow us to collect 3D texture meshes quickly. Those can help us understand and analyse the urban environment, and as a consequence are useful for several applications like spatial analysis and urban planning. Semantic segmentation of texture meshes through deep learning methods can enhance this understanding, but it requires a lot of labelled data. The contributions of this work are three-fold: (1) a new benchmark dataset of semantic urban meshes, (2) a novel semi-automatic annotation framework, and (3) an annotation tool for 3D meshes. In particular, our dataset covers about 4 km2 in Helsinki (Finland), with six classes, and we estimate that we save about 600 h of labelling work using our annotation framework, which includes initial segmentation and interactive refinement. We also compare the performance of several state-of-the-art 3D semantic segmentation methods on the new benchmark dataset. Other researchers can use our results to train their networks: the dataset is publicly available, and the annotation tool is released as open-source.
Semantic segmentation, especially for buildings, from the very high resolution (VHR) airborne images is an important task in urban mapping applications. Nowadays, the deep learning has significantly improved and applied in computer vision applications. Fully Convolutional Networks (FCN) is one of the tops voted method due to their good performance and high computational efficiency. However, the state-of-art results of deep nets depend on the training on large-scale benchmark datasets. Unfortunately, the benchmarks of VHR images are limited and have less generalization capability to another area of interest. As existing high precision base maps are easily available and objects are not changed dramatically in an urban area, the map information can be used to label images for training samples. Apart from object changes between maps and images due to time differences, the maps often cannot perfectly match with images. In this study, the main mislabeling sources are considered and addressed by utilizing stereo images, such as relief displacement, different representation between the base map and the image, and occlusion areas in the image. These free training samples are then fed to a pre-trained FCN. To find the better result, we applied fine-tuning with different learning rates and freezing different layers. We further improved the results by introducing atrous convolution. By using free training samples, we achieve a promising building classification with 85.6% overall accuracy and 83.77% F1 score, while the result from ISPRS benchmark by using manual labels has 92.02% overall accuracy and 84.06% F1 score, due to the building complexities in our study area.