Digital Soil Mapping based on PDFs of Cone Penetration Tests and Vibro Cores using Image Processing and Machine Learning

More Info
expand_more

Abstract

Digital Soil Mapping (DSM) of soil types in geotechnical project areas is a top priority. These maps are often used in decision making and can have significant consequences related to costs and risks. Usually, these maps are generated by digital soil models that interpolate soil types at known locations. In practice, conventional spatial interpolation techniques are still often used for DSM of soil types, such as inverse distance weighting and kriging. However, conventional models are not well suited for predicting or interpolating soil types because of their inability to deal with categorical data properly. Besides, the design of the conventional models does not allow for incorporating the abundance of meaningful covariate information that is available nowadays. The flexibility of machine learning algorithms vanquish both problems and has become increasingly popular for DSM of soil properties in recent years. The results of machine learning techniques for DSM of soil properties are promising and generally outperform conventional models. However, few studies have used machine learning for DSM of soil types and is therefore still a relatively unknown field. Moreover, at the time of writing, there are no studies that use sequence models for DSM of soil properties or types. Hence, the author proposes to introduce a new method for DSM of soil types, namely a Long Short-Term Memory (LSTM) network. The intuition behind this introduction is that the spatial correlation can be captured in sequences and can improve soil type prediction.

Real project data from a cable burial project is used to evaluate and compare the performance of the conventional interpolation methods triangulation and kriging, the machine learning models random forest and XGBoost, and the newly proposed deep learning model LSTM. The project data consist of 757 vibro cores (VC), 718 cone penetration test (CPT), bathymetry data and sub-bottom profilers. The geotechnical data, i.e. VCs and CPTs, is received on separate PDF pages that require to be digitized first. This thesis describes a simple yet precise manner to extract this data from the PDFs. The VCs and CPTs are provided with a soil type interpretation and can be used directly for developing the models. The data is split into a training set to develop/train the models and a test set for evaluation. Ultimately, the best performing model is used to build a 3D stratigraphic soil model for the project area with associated prediction accuracies.

All state-of-the-art techniques outperform the conventional models and especially in predicting minority classes. The best performing model is random forest with an overall accuracy of 85.44\% and is comparable to the performance of XGBoost of 85.11\%. LSTM network achieved a slightly lower accuracy of 84.27\%. The results show that LSTM is suitable for DSM of soil types and has considerable potential for improvement as only a few possibilities of the model have been examined.

Files

Final_Thesis_SFOrdeman.pdf
(pdf | 8.36 Mb)
- Embargo expired in 28-02-2024