Machine learning for prediction of undrained shear strength from cone penetration test data

More Info
expand_more

Abstract

The need of shear strength measurements of soil in the design phase of geotechnical engineering is almost indispensable. Many methods have been applied to estimate the shear strength of soil, including various laboratory test, in-situ test and analytical methods. As an in-situ test method, cone penetration test (CPT) is a powerful and cost-effective tool for the investigation of subsoil conditions. CPT data is usually complemented by the laboratory test data for verification. The laboratory-based studies of subsoil, however, can be not only a complex but also tedious and expensive task for large projects involving large amount of data. Therefore, new approaches for estimating the soil shear strength are demanded. Having demonstrated superior predictive ability for many material properties compared to traditional methods, machine learning methods have been increasingly popular and widely used. This thesis focus on the prediction of soil undrained shear strength through cone penetration test data. The major objectives of this master thesis include testing how machine learning could help us lower the need for laboratory test data. At first, the research starts with a literature review of various methods used to evaluate the soil shear strength. Comparing to the machine learning methods, the laboratory and in-situ test methods are relatively more time-consuming, costly and labour-intensive. And the analytical methods are considered lacking in precision. Then the training dataset which consists of 526 samples is introduced. In each sample, there are four input variables obtained from cone penetration test, namely the effective stress (σ′v ), cone tip resistance (qt − σv), effective cone tip resistance (qt − u2) and the excess pore pressure (u2 − u0). The undrained shear strength obtained from laboratory test is taken as the output variable. Next, the training dataset is fed to five machine learning techniques, namely the artificial neural network, support vector machine, Gaussian process regression, random forest and XGBoost, to train models. The hyperparameters are tuned with k-fold and group k-fold cross-validation strategies in the validation process. After that, the testing dataset which consists of 20 samples is established. Cone penetration test data that are in close vicinity to the location of the samples are processed by Gaussian process regression to obtain representative cone penetration test data at the sample location, which is taken as the inputs in the testing dataset. The undrained shear strengths of the samples are measured by Consolidated-Undrained shear test and are taken as the outputs of the testing dataset. Finally, the five machine learning models are tested on the testing dataset. The crossvalidation results, together with the prediction results of the models on the training and testing dataset are evaluated, gathered and compared by various statistic metrics to show the relative performance of the models. XGBoost appears to be the most accurate of all the tested algorithms on this dataset. And Gaussian process regression is chosen as the second option due to its ability to capture uncertainties. The robustness of these two models are then validated from a statistical point of view by applying Monte Carlo analysis. The importance of the input parameters in this study is evaluated by applying random forest for the sensitivity analysis. The results from random forest indicate that the excess pore pressure and the cone tip resistance - total vertical stress are the most influential inputs to the undrained shear strength