Machine Learning-based Classification of Different 3D Point Cloud Data of Railway Environments using Random Forest and DGCNN

More Info


Although monitoring and maintenance of railways is important to ensure safety and avoid delays and financial losses, it is still mainly based on human inspection. The complexity of a railway along with the large area it extends makes manual monitoring difficult and time-consuming. The increasing availability of 3D acquisition technologies has made point clouds a widely used 3D data form. Thus, identifying the key components of a railway and its environment using 3D point cloud data can be the first step for automating this procedure. The past years, machine learning has become the most popular subfield of artificial intelligence used for various different applications, with its subfield of deep learning evolving dramatically. Although deep learning has been vastly researched on 2D data, applying deep learning on 3D point clouds can be challenging due to the irregularity, unstructuredness and unorderedness of such data. To overcome those challenges, recent approaches use projection or voxelization, while the latest methods focus on working directly on raw point cloud data.

In the present research, 2 machine learning methods are implemented to classify 3D point cloud data of railway environments into 7 categories of rails, sleepers, track bed, masts, overhead wires, trees and other. To this end, the ensemble method of Random Forest, and the deep learning method of DGCNN (Dynamic Graph Convolutional Neural Network) are implemented. While Random Forest is a simple and handy ensemble algorithm used for a wide range of applications, DGCNN is a deep learning method based on PointNet, the pioneer method on raw point clouds, and graph CNNs. The methods are validated under 3 case studies, produced by structure from motion photogrammetry, airborne laser scanning and terrestrial laser scanning, and locating in 2 different areas within the Netherlands. For each method, 2 scenarios are developed for classifying colored and uncolored point clouds, respectively. Finally, the 2 methods are combined for the first scenario, as a first attempt to further improve the final results.

The obtained results show that, in this work, DGCNN performs better than Random Forest, and both methods approximate state-of-the-art performance. The contribution of colors is important to improve both the overall accuracy of the models, as well as the classification results of the individual classes. More specifically, Random Forest scenario 1, Random Forest scenario 2, DGCNN scenario 1, DGCNN scenario 2, and the combination of Random Forest and DGCNN scenario 1 result in an overall accuracy of 88.65%, 83.39%, 89.19%, 88.20% and 90.57%, respectively. The corresponding per class F1-scores for all methods and scenarios range between 43% and 94%. Both methods meet difficulties in generalizing on data from different sensor systems and different areas with very different point density and missing data. Nonetheless, the individual methods are already very promising, while they are able to achieve the required accuracy of more than 90% when combined.