Semantic segmentation of the AHN dataset with the Random Forest Classifier

More Info
expand_more

Abstract

Three-dimensional (3D) city models are of great significance and are high in demand. They can be used for various useful applications such as urban planning, visibility analysis and estimating the solar irradiation and energy demand of buildings throughout the day. Nowadays, Light Detection And Ranging (LiDAR) sensors are one of the most commonly used technologies for the acquisition of cheap, fast, dense, and reliable 3D point cloud datasets. At the same time, increasing attention has been focused lately on the utilization of these 3D point cloud datasets for the reconstruction of 3D city models. The 3D representation of a scene in the form of a 3D point cloud facilitates a variety of analysis tasks like object recognition, segmentation, and classification, which are an important prerequisite for many building reconstruction methods. Thus, having an accurate classifier that can automatically assign 3D points a respective semantic class label is of utmost importance as it can significantly reduce the time and cost required to analyse 3D scenes. Training machine learning and deep learning algorithms to perform this task has been the focus of many recent scientific works that provide promising results and insights for future work.
This thesis attempted to create an accurate Random Forest Classifier for LiDAR point cloud datasets using the Actueel Hoogtebestand Nederland 3 (AHN3) dataset as training data. The aim was to assist building reconstruction methods, and for this reason, only three semantic class labels are assigned to the points by the classifier, namely ground, building and other. Multiple experiments were conducted to test how large the training dataset needs to be, what features should be included and what point density the input point cloud needs to have for the classifier to perform well. Tests were also made using the DALES dataset to evaluate the performance of the classifier for different datasets and environments. Moreover, the time and memory required to train and test the various Random Forests models and the evaluation metrics of the results were stored as benchmarking research to provide insights and guide future work.
The classification approach that was used consists of five major steps. First, the input point cloud is uniformly sampled and then spherical local neighbourhoods of the points are computed in multiple scales. Height and eigen-based features are then extracted for each local neighbourhood, and the best subset of features is then used to train the Random Forest Classifier. Finally, each point of the input point cloud is assigned to the same class as its nearest neighbour in the sampled point cloud. A comparative analysis shows that the performance of our final Random Forest model stands in parallel with that of other available, more complex, deep learning algorithms if we take into consideration its simplicity and efficiency.