Landslide Detection using Random Forest Classifier
More Info
expand_more
Abstract
Landslides are destructive and recurrent natural disasters that cost annually significant social and economic losses all over the world. These events can be induced by natural factors as earthquakes and extreme rainfall, as well as by human intervention, including construction and mining. A primary resource to conduct landslides studies for prediction, risk assessment, and mitigation are historical databases with accurate location of individual events. To increase the location accuracy of those past landslide events, and optimize conventional time- and cost- consuming mapping routines, this study aims to develop an automatic landslide detection method from free-of-charge optical satellite imagery (Sentinel-2) and global Digital Elevation Model (ALOS World3D-30m DEM) using Object-based Image Analysis (OBIA) in combination with Machine Learning (ML). Existing works have successfully used earth-observation datasets for the generation of landslides databases. Most of them apply rule-based techniques using features thresholds that are not global and therefore perform poorly when applied to new regions where the method was not developed. This study presents a first attempt of an automatic method that generalizes to landslides occurring over the entire world without knowledge of their cause or triggering factor. To obtain a robust method that can deal with the complex characteristics of landslides (e.g. diversity of shapes/sizes, land cover, illumination and spectral variability), we explored OBIA, an image processing technique that has demonstrated better performance than the pixel-based approach, specially when the target objects are bigger than the cell resolution. The developed method consists in cloud-free images acquisition and determination of suitable features for image segmentation and image classification. For the image segmentation, we developed a two-step approach that consists in an initial segmentation using k-means and the Red/Green Difference (RGD) as input feature to create homogeneous segments and isolate landslides from non-landslides. This first approach leads to oversegmentation of non-landslide areas and, consequently, to an imbalanced dataset. The second step consists in a merging algorithm using Normalized Difference Vegetation Index (NDVI) as input feature to merge homogeneous non-landslide segments and balance the dataset. These two-stages include the setting of parameters as the number of clusters (K) and NDVI thresholds that were experimentally derived. Once the segments are created and the dataset is balanced, a non-parametric supervised classification using Random Forest (RF) was applied to identify landslide segments; the main advantage of this classifier is that it can deal with different statistical distributions of features and can handle imbalanced datasets. Using a training and testing set of 70% and 30%, our method achieved a precision of 83%, recall of 83%, and f1-score of 83%. We found that topographic features have less influence than spectral ones; however, their exclusion decreases the model performance in about 10%. Our method is built using entirely open source technologies allowing its applicability and re-usability. For future work, we propose to use our method to detect new landslides and increase the number of training samples. Additionally, we recommend to explore a complementary approach to the merging algorithm to reduce the number of non-landslide segments, balance the dataset, and keep accurate classification results while more training images are added to the model.