Access to clean and safe drinking water is essential for public health and sustainable development. Drinking water treatment plants (DWTPs) ensure water quality, but fluctuating raw water characteristics, particularly turbidity, challenge efficient coagulation and dosing control.
...
Access to clean and safe drinking water is essential for public health and sustainable development. Drinking water treatment plants (DWTPs) ensure water quality, but fluctuating raw water characteristics, particularly turbidity, challenge efficient coagulation and dosing control. Traditional strategies like jar tests and feed-backward control are limited by delayed results, making timely adjustments difficult. Short-term predictive tools based on machine learning (ML) offer a solution by forecasting water quality variations and enabling proactive control. This study develops several different ML models for short-term turbidity prediction at the Lekkanaal DWTP, addressing three questions: (1) which parameters influence turbidity, (2) which feature combinations yield optimal predictions, and (3) how far in advance turbidity can be reliably forecasted.
Historical water quality and hydrological data were collected from Waternet, KNMI, and Rijkswaterstaat, followed by preprocessing for reliable inputs. Candidate features were selected using Spearman correlation and Self-Organizing Maps (SOMs). Three regression models—AutoRegressive Integrated Moving Average (ARIMA), Random Forest (RF), and Long Short-Term Memory (LSTM)—were trained for different horizons, and feature importance analyzed using greedy selection and visualization tools. An RF classifier evaluated the feasibility of predicting peak turbidity events.
Results showed turbidity was driven by hydrological and physicochemical factors. Upstream discharge and turbidity strongly correlated with local measurements, highlighting the Lek River as a primary contributor, while EC and temperature showed negative correlations, reflecting dilution and seasonal sediment mobilization. SOMs confirmed high turbidity coincides with northward flows from the Lek River into the Amsterdam-Rhine Canal.
Feature analysis indicated univariate models using recent sensor\_turbidity outperformed multivariate models; additional features introduced noise. The last three hours of turbidity dominated predictions across ARIMA, RF, and LSTM.
All models provided reliable short-term forecasts, with RF outperforming ARIMA and LSTM for 3- and 6-hour horizons. Extreme peaks were systematically underestimated, and RF classification detected fewer than 16\% of peak events. Short-term forecasts up to six hours are feasible, but high-magnitude events remain challenging, emphasizing the need for enhanced monitoring and tailored strategies.