Evaluating the Performance of Multivariate Imputation by Chained Equations (MICE) when Predicting Missing Well-Log Data in Sedimentary Basins

Master Thesis (2023)
Author(s)

L.C. Baez Lozada (TU Delft - Civil Engineering & Geosciences)

Contributor(s)

Guillaume Rongier – Mentor (TU Delft - Applied Geology)

H.A. Abels – Graduation committee member (TU Delft - Applied Geology)

Masoud Soleymani Shishvan – Graduation committee member (TU Delft - Resource Engineering)

Faculty
Civil Engineering & Geosciences
Copyright
© 2023 Luis Carlos Baez Lozada
More Info
expand_more
Publication Year
2023
Language
English
Copyright
© 2023 Luis Carlos Baez Lozada
Graduation Date
24-08-2023
Awarding Institution
Delft University of Technology
Programme
Geo-Energy Engineering
Faculty
Civil Engineering & Geosciences
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

This research evaluates the applicability of Multivariate Imputation by Chained Equations (MICE) for estimating missing well-log data across different sedimentary basis. Utilizing various machine learning techniques including XGBoost (XGB), Random Forest (RF), K-Nearest Neighbors (KNR), and Bayesian Ridge (BR), the performance of MICE was tested on three different data sets from distinct geological contexts and preprocessing conditions with minimal user input.
The main results indicate that the performance of MICE varied across different data sets and well-logs, highlighting the complexity of imputing missing data in heterogeneous sedimentary basins. The number of iterations in MICE did not significantly impact the performance of the models, while data quality, pre-processing, and geological complexities played crucial roles. The Force-200 data set, which underwent extensive preprocessing, demonstrated better imputation performance compared to the Montney and Beetaloo data sets. Additionally, XGB often outperformed other algorithms, predicting missing values with different number of iterations.
The main conclusions drawn from this study emphasize the need for more research to minimize user input and to develop more robust and flexible approaches to imputing missing data in well-logs. The study highlights the challenge of determining a single set of hyperparameters optimal for all the well-logs, suggesting the need for more adaptable models or even advanced techniques like deep learning techniques. The research also suggests the importance of refining pre-processing techniques, exploring further combinations of well-logs, and developing cross-validation approaches that effectively replicates real-world scenarios to advance the application and reliability of MICE in data imputation of subsurface data with missing values.

Files

License info not available