Machine learning-based in-situ detection of toxic petroleum hydrocarbons in groundwater

Journal Article (2025)
Author(s)

C Wu (TU Delft - Applied Mechanics, Wetsus, European Centre of Excellence for Sustainable Water Technology)

R. M. Wagterveld (Wetsus, European Centre of Excellence for Sustainable Water Technology)

L.C. Rietveld (TU Delft - Sanitary Engineering)

B.M. van Breukelen (TU Delft - Surface and Groundwater Hydrology)

Research Group
Sanitary Engineering
DOI related publication
https://doi.org/10.1016/j.jconhyd.2025.104771
More Info
expand_more
Publication Year
2025
Language
English
Research Group
Sanitary Engineering
Volume number
276
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Monitored natural attenuation is commonly used to manage petroleum hydrocarbon-contaminated groundwater. However, it requires periodic, costly grab sampling. We propose a cost-effective, real-time groundwater monitoring proof-of-concept machine learning (ML) framework using in-situ sensors—pH, dissolved oxygen, electrical conductivity, and redox potential—to detect benzene, ethylbenzene, and xylenes (BEX). We built upon the established correlations between hydrocarbon concentrations and in-situ water quality parameters (iWQPs). Due to limited field data, we validated the framework using datasets at virtual wells within a simulated aquifer from our previously developed reactive transport model. In this application, we detected the spreading of pollution downstream of the established pollution plume. The used framework is a binary classification system that flags contamination at virtual downstream wells. We compared five ML classifiers, i.e. Logistic Regression, Random Forest, XGBoost, Multi-layer Perceptron, and Support Vector Classifier, for early warning when BEX reached or exceeded the regulatory threshold of 5 μg/L. The models were trained on virtual wells at and near the source zone and predicted contamination before BEX reached the threshold at downstream virtual wells. This reflects the spatial variability in flow and reaction dynamics that altered BEX-iWQP relationships. Scenario analyses revealed the ML models' sensitivity to aquifer properties, i.e., hydraulic conductivity, electrical conductivity, and electron acceptor availability. We also assessed the impact of sensor noise and seasonal fluctuations on iWQPs. We found that even moderate levels of noise (10–20 %) can significantly affect model accuracy, particularly when the noise was introduced into the test data. Therefore, we recommended to combine hardware stabilization with adaptive smoothing techniques. With these approaches, our proposed framework remains promising for providing early warnings of plume migration toward sensitive receptors.