K. Zhou | TU Delft Repository

LiDAR-guided dense matching for detecting changes and updating of buildings in Airborne LiDAR data

Journal article (2020) - K. Zhou, R. Lindenbergh, Ben Gorte, S. Zlatanova

Change detection is essential to keep 3D city models up-to-date. LiDAR data with high accuracy are used to create 3D city models. However, updating LiDAR data at state or nation level often takes around a decade. Very high resolution (VHR) stereo images, with often yearly updating rate and dense 3D information, provide an option for validating and updating LiDAR data. However, the 3D information in both data sources has quality problems. LiDAR point clouds are sparse and irregularly spaced, and have mixed returns near building edges, while 3D information extracted from stereo images are affected by shadow and low texture. This research proposes LiDAR-guided dense matching to address these problems explicitly for detecting accurate building changes. Data sparsity and irregular spacing is addressed by densifying LiDAR points in a form of a digital surface model (DSM). Instead of applying interpolation with associated edge problems due to mixed returns, three candidate DSMs are created by linking each DSM pixel to up to three planes as identified in segmented and triangulated LiDAR data. The candidate DSMs limit the disparity search space for dense matching, addressing low texture and shadow problems in images. Through edge-aware dense matching, the detailed building edge information in stereo pairs determine the optimal heights to address LiDAR edge problem. Changes are detected where corresponding pixels from dense matching have large color differences. Due to homogeneous surroundings and shadows, only partial changes are initially detected. A second hierarchical dense matching step is employed to complete changes and update 3D information by propagating initial partial changes iteratively. The proposed method is applied on data from two cities, Amersfoort and Assen, the Netherlands, with around 1200 existing buildings. In both areas, the method successfully verifies unchanged buildings while detecting minimum changes of 2×2×2m³. New and removed building detection in Amersfoort both have a F1 score of over 0.8, both in pixel and object evaluation, while F1 scores in Assen are over 0.9 for both categories. The experiments also show that the proposed method outperforms two well-known change detection methods in terms of verifying unchanged buildings and detecting small changes simultaneously. ...

Change detection is essential to keep 3D city models up-to-date. LiDAR data with high accuracy are used to create 3D city models. However, updating LiDAR data at state or nation level often takes around a decade. Very high resolution (VHR) stereo images, with often yearly updating rate and dense 3D information, provide an option for validating and updating LiDAR data. However, the 3D information in both data sources has quality problems. LiDAR point clouds are sparse and irregularly spaced, and have mixed returns near building edges, while 3D information extracted from stereo images are affected by shadow and low texture. This research proposes LiDAR-guided dense matching to address these problems explicitly for detecting accurate building changes. Data sparsity and irregular spacing is addressed by densifying LiDAR points in a form of a digital surface model (DSM). Instead of applying interpolation with associated edge problems due to mixed returns, three candidate DSMs are created by linking each DSM pixel to up to three planes as identified in segmented and triangulated LiDAR data. The candidate DSMs limit the disparity search space for dense matching, addressing low texture and shadow problems in images. Through edge-aware dense matching, the detailed building edge information in stereo pairs determine the optimal heights to address LiDAR edge problem. Changes are detected where corresponding pixels from dense matching have large color differences. Due to homogeneous surroundings and shadows, only partial changes are initially detected. A second hierarchical dense matching step is employed to complete changes and update 3D information by propagating initial partial changes iteratively. The proposed method is applied on data from two cities, Amersfoort and Assen, the Netherlands, with around 1200 existing buildings. In both areas, the method successfully verifies unchanged buildings while detecting minimum changes of 2×2×2m³. New and removed building detection in Amersfoort both have a F1 score of over 0.8, both in pixel and object evaluation, while F1 scores in Assen are over 0.9 for both categories. The experiments also show that the proposed method outperforms two well-known change detection methods in terms of verifying unchanged buildings and detecting small changes simultaneously.

Combining LiDAR and Photogrammetry to Generate Up-to-date 3D City Models

Doctoral thesis (2020) - Kaixuan Zhou

3D city models are increasingly used to maintain and improve urban infrastructure. Keeping 3D city models accurate and up-to-date is essential for municipalities to make decisions in a time of strongly increasing urbanization. 3D information provided by airborne laser scanning (ALS) is widely used for generating 3D city models. However, ALS data is sparse and irregularly spaced, and not frequently acquired due to its high costs. Airborne camera imagery (ACIM) is an alternative to extract denser but less accurate 3D information. Given these limitations in acquisition frequency and quality, using either ALS or ACIM to generate up-to-date large-scale 3D city models is sub-optimal. Therefore, we combine the complementary characteristics of both data sources to achieve two objectives: (i) 3D change detection and updating of buildings in ALS data using ACIM data, and (ii) improving the planimetric accuracy of building extraction from ALS data using ACIM data. ALS data is integrated with a single image or a single stereo pair for the first objective, and with multiple stereo pairs for the second objective. Our methods are validated over three areas: Vaihingen, Germany, and Amersfoort and Assen, the Netherlands. Shadow in a single image is indicative for a 3D object and is represented in the image by RGB color values. However, these color values are not unique, as they depend on the local conditions, such as material and environment. We propose a supervised machine learning approach, random forest, to effectively characterize the color properties. To generate training samples, accelerated ray tracing is used to efficiently reconstruct shadow locations in the image using 3D ALS data. Using shadow alone is not sufficient to detect accurate building changes, as shadows only partially represent 3D information. 3D information can be extracted from corresponding pixels in a stereo pair, but this information is not accurate in shadow and low texture areas. To address this, we propose LEAD-Matching (LiDAR-guided edge-aware dense matching). It starts from using accurate plane information extracted from ALS data to densify sparse ALS points. Three candidate heights are then obtained for each densified point to guide the dense matching in these problematic areas. Subsequently, detailed building information in the stereo pair is integrated to choose the final optimal height. If the optimal height obtained by LEAD-Matching points to corresponding pixels of different color, a likely building change is found. Test results on the Amersfoort and Assen data show a successful verification of unchanged buildings while changes are detected starting from 2 × 2 × 2 m 3 , as conventionally required for large-scale 3D mapping, with an F1 score of 0.8 and 0.9 respectively. To achieve the second objective, we extend LEAD-Matching to multiple stereo pairs, to improve the planimetric accuracy of building extraction in ALS data. E-LEAD-Matching integrates building boundaries of high planimetric accuracy from multiple stereo pairs to the ALS data. Using multiple stereo pairs, occlusions in single stereo pairs are compensated, while the accuracy of building boundaries is improved. Compared to using ALS alone, the planimetric accuracy of extracted buildings improves from 0.40 m to 0.22 m in Vaihingen, and from 0.48 m to 0.21 m in Amersfoort. This improved planimetric accuracy actually meets conventional requirements of large-scale mapping. Our methods enable us to integrate the beneficial aspects from ALS and ACIM to generate accurate and up-to-date large-scale 3D city models. We anticipate that our research will save both money and time in generating future up-to-date large-scale 3D city models. ...

3D city models are increasingly used to maintain and improve urban infrastructure. Keeping 3D city models accurate and up-to-date is essential for municipalities to make decisions in a time of strongly increasing urbanization. 3D information provided by airborne laser scanning (ALS) is widely used for generating 3D city models. However, ALS data is sparse and irregularly spaced, and not frequently acquired due to its high costs. Airborne camera imagery (ACIM) is an alternative to extract denser but less accurate 3D information. Given these limitations in acquisition frequency and quality, using either ALS or ACIM to generate up-to-date large-scale 3D city models is sub-optimal. Therefore, we combine the complementary characteristics of both data sources to achieve two objectives: (i) 3D change detection and updating of buildings in ALS data using ACIM data, and (ii) improving the planimetric accuracy of building extraction from ALS data using ACIM data. ALS data is integrated with a single image or a single stereo pair for the first objective, and with multiple stereo pairs for the second objective. Our methods are validated over three areas: Vaihingen, Germany, and Amersfoort and Assen, the Netherlands. Shadow in a single image is indicative for a 3D object and is represented in the image by RGB color values. However, these color values are not unique, as they depend on the local conditions, such as material and environment. We propose a supervised machine learning approach, random forest, to effectively characterize the color properties. To generate training samples, accelerated ray tracing is used to efficiently reconstruct shadow locations in the image using 3D ALS data. Using shadow alone is not sufficient to detect accurate building changes, as shadows only partially represent 3D information. 3D information can be extracted from corresponding pixels in a stereo pair, but this information is not accurate in shadow and low texture areas. To address this, we propose LEAD-Matching (LiDAR-guided edge-aware dense matching). It starts from using accurate plane information extracted from ALS data to densify sparse ALS points. Three candidate heights are then obtained for each densified point to guide the dense matching in these problematic areas. Subsequently, detailed building information in the stereo pair is integrated to choose the final optimal height. If the optimal height obtained by LEAD-Matching points to corresponding pixels of different color, a likely building change is found. Test results on the Amersfoort and Assen data show a successful verification of unchanged buildings while changes are detected starting from 2 × 2 × 2 m 3 , as conventionally required for large-scale 3D mapping, with an F1 score of 0.8 and 0.9 respectively. To achieve the second objective, we extend LEAD-Matching to multiple stereo pairs, to improve the planimetric accuracy of building extraction in ALS data. E-LEAD-Matching integrates building boundaries of high planimetric accuracy from multiple stereo pairs to the ALS data. Using multiple stereo pairs, occlusions in single stereo pairs are compensated, while the accuracy of building boundaries is improved. Compared to using ALS alone, the planimetric accuracy of extracted buildings improves from 0.40 m to 0.22 m in Vaihingen, and from 0.48 m to 0.21 m in Amersfoort. This improved planimetric accuracy actually meets conventional requirements of large-scale mapping. Our methods enable us to integrate the beneficial aspects from ALS and ACIM to generate accurate and up-to-date large-scale 3D city models. We anticipate that our research will save both money and time in generating future up-to-date large-scale 3D city models.

Automatic shadow detection in urban very-high-resolution images using existing 3D models for free training

Journal article (2019) - Kaixuan Zhou, Roderik Lindenbergh, Ben Gorte

Up-to-date 3D city models are needed for many applications. Very-high-resolution (VHR) images with rich geometric and spectral information and a high update rate are increasingly applied for the purpose of updating 3D models. Shadow detection is the primary step for image interpretation, as shadow causes radiometric distortions. In addition, shadow itself is valuable geometric information. However, shadows are often complicated and environment-dependent. Supervised learning is considered to perform well in detecting shadows when training samples selected from these images are available. Unfortunately, manual labeling of images is expensive. Existing 3D models have been used to reconstruct shadows to provide free, computer-generated training samples, i.e., samples free from intensive manual labeling. However, accurate shadow reconstruction for large 3D models consisting of millions of triangles is either difficult or time-consuming. In addition, due to inaccuracy and incompleteness of the model, and different acquisition time between 3D models and images, mislabeling refers to training samples that are shadows but labeled as non-shadows and vice versa. We propose a ray-tracing approach with an effective KD tree construction to feasibly reconstruct accurate shadows for a large 3D model. An adaptive erosion approach is first provided to remove mislabeling effects near shadow boundaries. Next, a comparative study considering four classification methods, quadratic discriminant analysis (QDA) fusion, support vector machine (SVM), K nearest neighbors (KNN) and Random forest (RF), is performed to select the best classification method with respect to capturing the complicated properties of shadows and robustness to mislabeling. The experiments are performed on Dutch Amersfoort data with around 20% mislabels and the Toronto benchmark by simulating mislabels from inverting shadows to non-shadows. RF is tested to give robust and best results with 95.38% overall accuracy (OA) and a value of 0.9 for kappa coefficient (KC) for Amersfoort and around 96% OA and 0.92 KC for Toronto benchmarks when no more than 50% of shadows are inverted. QDA fusion and KNN are tested to be robust to mislabels but their capability to capture complicated properties of shadows is worse than RF. SVM is tested to have a good capability to separate shadow and non-shadows but is largely affected by mislabeled samples. It is shown that RF with free-training samples from existing 3D models is an automatic, effective, and robust approach for shadow detection from VHR images. ...

Up-to-date 3D city models are needed for many applications. Very-high-resolution (VHR) images with rich geometric and spectral information and a high update rate are increasingly applied for the purpose of updating 3D models. Shadow detection is the primary step for image interpretation, as shadow causes radiometric distortions. In addition, shadow itself is valuable geometric information. However, shadows are often complicated and environment-dependent. Supervised learning is considered to perform well in detecting shadows when training samples selected from these images are available. Unfortunately, manual labeling of images is expensive. Existing 3D models have been used to reconstruct shadows to provide free, computer-generated training samples, i.e., samples free from intensive manual labeling. However, accurate shadow reconstruction for large 3D models consisting of millions of triangles is either difficult or time-consuming. In addition, due to inaccuracy and incompleteness of the model, and different acquisition time between 3D models and images, mislabeling refers to training samples that are shadows but labeled as non-shadows and vice versa. We propose a ray-tracing approach with an effective KD tree construction to feasibly reconstruct accurate shadows for a large 3D model. An adaptive erosion approach is first provided to remove mislabeling effects near shadow boundaries. Next, a comparative study considering four classification methods, quadratic discriminant analysis (QDA) fusion, support vector machine (SVM), K nearest neighbors (KNN) and Random forest (RF), is performed to select the best classification method with respect to capturing the complicated properties of shadows and robustness to mislabeling. The experiments are performed on Dutch Amersfoort data with around 20% mislabels and the Toronto benchmark by simulating mislabels from inverting shadows to non-shadows. RF is tested to give robust and best results with 95.38% overall accuracy (OA) and a value of 0.9 for kappa coefficient (KC) for Amersfoort and around 96% OA and 0.92 KC for Toronto benchmarks when no more than 50% of shadows are inverted. QDA fusion and KNN are tested to be robust to mislabels but their capability to capture complicated properties of shadows is worse than RF. SVM is tested to have a good capability to separate shadow and non-shadows but is largely affected by mislabeled samples. It is shown that RF with free-training samples from existing 3D models is an automatic, effective, and robust approach for shadow detection from VHR images.

Building segmentation from airborne vhr images using mask r-cnn

Journal article (2019) - K. Zhou, Y. Chen, I. Smal, R. Lindenbergh

Up-to-date 3D building models are important for many applications. Airborne very high resolution (VHR) images often acquired annually give an opportunity to create an up-to-date 3D model. Building segmentation is often the first and utmost step. Convolutional neural networks (CNNs) draw lots of attention in interpreting VHR images as they can learn very effective features for very complex scenes. This paper employs Mask R-CNN to address two problems in building segmentation: detecting different scales of building and segmenting buildings to have accurately segmented edges. Mask R-CNN starts from feature pyramid network (FPN) to create different scales of semantically rich features. FPN is integrated with region proposal network (RPN) to generate objects with various scales with the corresponding optimal scale of features. The features with high and low levels of information are further used for better object classification of small objects and for mask prediction of edges. The method is tested on ISPRS benchmark dataset by comparing results with the fully convolutional networks (FCN), which merge high and low level features by a skip-layer to create a single feature for semantic segmentation. The results show that Mask R-CNN outperforms FCN with around 15% in detecting objects, especially in detecting small objects. Moreover, Mask R-CNN has much better results in edge region than FCN. The results also show that choosing the range of anchor scales in Mask R-CNN is a critical factor in segmenting different scale of objects. This paper provides an insight into how a good anchor scale for different dataset should be chosen. ...

A computationally cheap trick to determine shadow in a voxel model

Journal article (2018) - B. G.H. Gorte, K. Zhou, C. J. Van Der Sande, C. Valk

Representation of scenes on the Earth surface by using voxels is gaining attention because of its suitability for integrating heterogeneous data sources in simulations and quantitative models. Computation of shadows in such models is needed, for example, to obtain crop suitability of agricultural fields in the presence of trees and buildings, or to analyze urban heat island causes and effects. We present an efficient algorithm to compute which of the voxels in a dataset receive direct sunlight, given the solar azimuth and elevation angles. The algorithm can work with multiple (sparse and dense) voxel storage strategies. ...

Extraction of building roof edges from LiDAR data to optimize the digital surface model for true orthophoto generation

Journal article (2018) - E. Widyaningrum, R. C. Lindenbergh, B. G.H. Gorte, K. Zhou

Various kinds of urban applications require true orthophotos. True orthophoto generation requires a DSM (Digital Surface Model) to project the photo orthogonally and minimize geometric distortion due to topographic variance. DSMs are often generated from airborne laser scan data. In urban scenes, DSM data may fail to deliver sharp and straight building roof edges. This will affect the quality of the resulting orthophotos. Therefore, it is necessary to incorporate good quality building outlines as breaklines during DSM interpolation. This study proposes a data-driven approach to construct building roof outlines from LiDAR point clouds by a workflow consisting of the following steps: given roof segments, roof boundary points are extracted using a concave hull algorithm. Straight edges may be difficult to find in complex roof configurations. Therefore, two ingredients are combined. First, RanSAC corner point preselection, and second, DBSCAN-based clustering of edge points. The method is demonstrated on an area of ±1.2 km² containing 42 buildings of different characteristics. A quality assessment shows that the proposed method is able to deliver 92% of building lines with acceptable geometric accuracy in comparison to a building line in the base map. ...

3D building change detection between current vhr images and past LiDAR data

Journal article (2018) - K. Zhou, B. Gorte, R. Lindenbergh, E. Widyaningrum

Change detection is an essential step to locate the area where an old model should be updated. With high density and accuracy, LiDAR data is often used to create a 3D city model. However, updating LiDAR data at state or nation level often takes years. Very high resolution (VHR) images with high updating rate is therefore an option for change detection. This paper provides a novel and efficient approach to derive pixel-based building change detection between past LiDAR and new VHR images. The proposed approach aims notably at reducing false alarms of changes near edges. For this purpose, LiDAR data is used to supervise the process of finding stereo pairs and derive the changes directly. This paper proposes to derive three possible heights (so three DSMs) by exploiting planar segments from LiDAR data. Near edges, the up to three possible heights are transformed into discrete disparities. A optimal disparity is selected from a reasonable and computational efficient range centered on them. If the optimal disparity is selected, but still the stereo pair found is wrong, a change has been found. A Markov random field (MRF) with built-in edge awareness from images is designed to find optimal disparity. By segmenting the pixels into plane and edge segments, the global optimization problem is split into many local ones which makes the optimization very efficient. Using an optimization and a consecutive occlusion consistency check, the changes are derived from stereo pairs having high color difference. The algorithm is tested to find changes in an urban areas in the city of Amersfoort, the Netherlands. The two different test cases show that the algorithm is indeed efficient. The optimized disparity images have sharp edges along those of images and false alarms of changes near or on edges and occlusions are largely reduced. ...

Building classification of VHR airborne stereo images using fully convolutional networks and free training samples

Journal article (2018) - Y. Chen, W. Gao, E. Widyaningrum, M. Zheng, Kaixuan Zhou

Semantic segmentation, especially for buildings, from the very high resolution (VHR) airborne images is an important task in urban mapping applications. Nowadays, the deep learning has significantly improved and applied in computer vision applications. Fully Convolutional Networks (FCN) is one of the tops voted method due to their good performance and high computational efficiency. However, the state-of-art results of deep nets depend on the training on large-scale benchmark datasets. Unfortunately, the benchmarks of VHR images are limited and have less generalization capability to another area of interest. As existing high precision base maps are easily available and objects are not changed dramatically in an urban area, the map information can be used to label images for training samples. Apart from object changes between maps and images due to time differences, the maps often cannot perfectly match with images. In this study, the main mislabeling sources are considered and addressed by utilizing stereo images, such as relief displacement, different representation between the base map and the image, and occlusion areas in the image. These free training samples are then fed to a pre-trained FCN. To find the better result, we applied fine-tuning with different learning rates and freezing different layers. We further improved the results by introducing atrous convolution. By using free training samples, we achieve a promising building classification with 85.6% overall accuracy and 83.77% F1 score, while the result from ISPRS benchmark by using manual labels has 92.02% overall accuracy and 84.06% F1 score, due to the building complexities in our study area. ...

Shadow detection from VHR aerial images in urban area by using 3D city models and a decision fusion approach

Conference paper (2017) - K. Zhou, B. Gorte

In VHR(very high resolution) aerial images, shadows indicating height information are valuable for validating or detecting changes on an existing 3D city model. In the paper, we propose a novel and full automatic approach for shadow detection from VHR images. Instead of automatic thresholding, the supervised machine learning approach is expected with better performance on shadow detection, but it requires to obtain training samples manually. The shadow image reconstructed from an existing 3D city model can provide free training samples with large variety. However, as the 3D model is often not accuracy, incomplete and outdated, a small portion of training samples are mislabeled. The erosion morphology is provided to remove boundary pixels which have high mislabeling possibility from the reconstructed image. Moreover, the quadratic discriminant analysis (QDA) which is resistant to the mislabeling is chosen. Further, two feature domains, RGB and ratio of the hue over the intensity, are analyzed to have complementary effects on better detecting different objects. Finally, a decision fusion approach is proposed to combine the results wisely from preliminary classifications from two feature domains. The fuzzy membership is a confidence measurement and determines the way of making decision, in the meanwhile the memberships are weighted by an entropy measurements to indicate their certainties. The experimental results on two cities in the Netherlands demonstrate that the proposed approach outperforms the two separate classifiers and two stacked-vector fusion approaches. ...