The provision of safe drinking water is essential in every society since it determines people’s health and well-being. Drinking Water Distribution Networks (DWDN) are vital for this purpose but are susceptible to pathogen contamination and outbreaks due to cascading events after
...
The provision of safe drinking water is essential in every society since it determines people’s health and well-being. Drinking Water Distribution Networks (DWDN) are vital for this purpose but are susceptible to pathogen contamination and outbreaks due to cascading events after infrastructure failures, main repairs, human errors, or malicious attacks. When a contamination event occurs in the DWDN, the preservation of health of the public should be the top priority in every emergency response mechanism. Exposure to contaminated water can cause significant health risks by introducing pathogens such as enterovirus, Campylobacter, and Cryptosporidium. For this reason, DWDNs are nowadays considered critical infrastructures, recognized by USA's Presidential Policy Directive 21 and the European Union's Directive (EU) 2022/2557.
During contamination events in the DWDN, water utilities need to act quickly, make informed decisions, assess the threat, and effectively mitigate the event. The central objective of the study of this thesis was to generate knowledge to help address the growing challenge of waterborne pathogen contamination in DWDNs and develop applications that can enhance decision-making and immediate actions in such emergencies. Tools and methodologies were developed and evaluated focusing on two main pillars. The first pillar involves understanding the event based on historical knowledge. Innovative approaches were developed and assessed for Artificial Intelligence-based information extraction and question-answering using scientific publications, enabling rapid access to up-to-date pathogen characteristics, historical information on contamination events, and control actions. The second pillar focuses on predicting and managing the specific contamination event in real-time. Advanced modeling tools were created to simulate contamination events in DWDNs, providing realistic representations of hydraulics and water quality dynamics, predicted health impacts, and support for real-time decision-making during emergencies.
Chapter 2 describes the development of an Artificial Intelligence (AI)-based model that extracts specific pathogen information from the scientific literature. By leveraging Natural Language Processing (NLP) and Deep Learning (DL) techniques, the study evaluated the feasibility and performance of an Information Extraction model to extract both qualitative and quantitative information from scientific publications about the waterborne pathogen Legionella. For the development of the model, a combination of supervised and rule-based techniques was adopted. The evaluation metrics showed a satisfactory performance for extraction of both qualitative and quantitative information with an overall F-score of 85% and 95% for the supervised and rule-based technique respectively. The model was also compared with a human extraction, returning similar results and indicating that the extracted information is of high quality. The results showed that the model can be used to rapidly extract critical information from text documents and be a useful tool for water utilities, enabling faster and more informed decision-making during the early stages of contamination.
Chapter 3 systematically assesses the performance of various open-source Large Language Models (LLM), including Llama 2, Mistral, and Gemma (and their variations) in a question-answering task related to pathogen contamination events of drinking water. The evaluation metrics included Precision, Recall, F1 score, Automated Accuracy, and Empty Score. The model with the highest performance on a set of 23 questions using 188 scientific publications was then manually evaluated by a human (Human Evaluation). The results showed that all models performed reasonably well with an average F1 score ranging from 81% to 87%. After considering all the evaluation metrics, the Llama 2 model was the most reliable model with an average Automated Accuracy of 86%. However, the hallucination effect of Llama 2 was evident. The Gemma model had a lower Automated Accuracy score but was less prone to hallucination. The Human Evaluation showed that the Llama 2 model delivered correct answers when the questions were clear and straightforward. However, when the question required further interpretation, the model often struggled. Overall, the study demonstrated that the use of LLMs in automated information extraction tasks show great potential for time-critical applications, such as processing large volumes of (historical) data in real-time thereby making it feasible to make historical information available in near rea-time in case of emergencies.
Building on the response to a pathogen contamination event in the DWDN, Chapter 4 presents the BeWaRE benchmark testbed, a comprehensive model. This testbed went beyond the state-of-the art and integrated all current relevant knowledge on pathogen transport and fate, bulk and wall chlorine decay, fast and slow chlorine reactions with TOC, TOC degradation, stochastic water demands, hydraulic uncertainty, and individual consumption patterns to calculate pathogen exposure and infection risk following the steps of Quantitative Microbial Risk Assessment (QMRA). A large wastewater contamination in different locations in a chlorinated and non-chlorinated network was simulated using three pathogens: Campylobacter, enterovirus, and Cryptosporidium. The results of this study showed that in non-chlorinated DWDNs, the modeled wastewater contamination event led to 11-46% infection risk in the total population, depending on the contamination location, but irrespective of the selected pathogen (due to the high pathogen concentration). On the other hand, in chlorinated DWDNs, the same scenarios resulted in lower infection risk for the pathogens that are susceptible to chlorine; 0.78-2.1 % for Campylobacter and 7.8-26.6 % for enterovirus. Moreover, the enterovirus infection risk was higher, despite the concentrations in the contamination source being lower, due to the lower susceptibility to chlorine than Campylobacter. While chlorination aids mitigation, large contaminations can still lead to infections due to chlorine resistance (for Cryptosporidium) and chlorine depletion at the contamination point. Finally, the varying levels of pathogen susceptibility to chlorine, the contamination location and duration, influenced the infection risk, while the response window to reduce the health impact was short; in these scenarios 5-10 hours post-contamination. The study provided a novel approach to assessing health risks, offering critical insights for water utilities to optimize their response during emergencies.
Chapter 5 further explores the added value of using modeling tools to support decision-making during emergencies in the DWDN. This was demonstrated through PathoINVEST, an analytical tool that utilizes the BeWaRE benchmark methodology, which was presented in the previous Chapter, to support water utilities in modeling contamination events in the DWDN. A case study was conducted with the aim of comparing a traditional approach (representing the status quo of current practices of water utilities) with a model-based approach (use of real-time modeling tools) during an emergency response to a contamination event in the DWDN. The model-based approach was shown to be more efficient than the traditional approach in identifying the source of contamination (1.3 versus 3.7 hours), requiring fewer samples (4 versus 11) and resulting in lower infection risk by the time the source was identified (12% versus 20%) in this case study. Moreover, the model-based approach was more effective in finding the best valves to close in the network (as mitigation measures) since it resulted in a 3%-point infection risk reduction. However, some actions taken in the traditional approach, such as the rapid closure of valves (cutting the network in half and thus limiting further spreading) before the contamination source was identified, were critical in mitigating the contamination. Another key finding was the importance of having an up-to-date overview of valve settings in the DWDN schematization to provide reliable results on source identification since any discrepancies between the actual network and the model can lead to inaccurate infection risk estimates when using modeling tools to support decision-making. Overall, this case study showed that integrating modeling tools in the current practices of water utilities provides a robust framework for improving water contamination management and decision-making processes, thus safeguarding public health during emergencies.
A concluding viewpoint is offered in Chapter 6, which considers whether the initial research questions from Chapter 1 were successfully answered. The implications of this research for water utilities are examined, providing information on how the proposed methodologies can be (and have been) used in real-world scenarios, facilitating a faster decision-making and contributing to effective mitigation of emergencies. Finally, the perspectives and future research are discussed, emphasizing the role of AI and the advancements in modeling tools. AI has shown significant potential in enhancing situational awareness and rapid information extraction during emergencies. Water utilities should explore the integration of AI into their standard operating procedures to further enhance emergency responses and routine management. Regarding the use of modeling tools during emergencies, future research should address key gaps, such as the complex dynamics when wastewater interacts with chlorine, the competition between chlorine-reducing agents, and the validity of hydraulic modeling assumptions such as perfect mixing. Accounting for cumulative health risks (multiple pathogens) and refining dose-response models to differentiate between infection and illness probabilities can provide insights for effectively managing risks to vulnerable populations. Moreover, the incorporation of metrics like Disability-Adjusted Life Years (DALYs) into modeling efforts could enable better communication of health impacts and evaluation of mitigation strategies. Finally, Digital Twins and real-time microbial sensors are identified as transformative technologies that can provide real-time insights into network dynamics. These advancements can shift water utility management from reactive approaches to proactive, data-driven strategies, significantly enhancing public health protection, operational efficiency, and resilience.