Automating Indicator Validation for Water Utility Benchmarking - A Data‐Driven Approach
P.S.P. Ramsundersingh (TU Delft - Electrical Engineering, Mathematics and Computer Science)
Cynthia C. S. Liem – Mentor (TU Delft - Multimedia Computing)
V.N.S.R. Dwarka – Mentor (TU Delft - Numerical Analysis)
Tom Julian Viering – Graduation committee member (TU Delft - Pattern Recognition and Bioinformatics)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Water flows through every aspect of life, yet the story of its delivery is only as reliable as the data that records it. In global benchmarking, such data is often uneven, incomplete, and rarely subjected to systematic validation, allowing anomalies to shape perceptions of performance before they are critically examined. This thesis addresses that gap by developing and evaluating a multi‐stage, data‐driven anomaly detection framework within the World Bank’s New International Benchmarking Network for Water and Sanitation Utilities (NewIBNET), situated at the intersection of data science, water governance, and digital ethics.
The framework weaves together four complementary layers – structural validation, rule‐based logical checks, peer comparison, and weighted prioritisation – transforming anomaly detection from a surface‐level cleaning task into a structured process of active quality assurance. Developed through an iterative, expert‐informed process, it is reproducible and adaptable, balancing statistical rigour with the contextual realities of the water sector so that each flag raised carries both analytical credibility and practical relevance.
Applied to the 2022–2024 NewIBNET dataset, the framework is assessed through robustness checks, a national case study of Indonesian utilities, and an expert survey. Results show that it improves anomaly interpretability, limits the propagation of flawed data into comparative analyses, and reduces review time from 75 hours to under 2 minutes – earning unanimous expert endorsement for operational deployment.
By translating the principles of automated, ethically grounded validation into a scalable methodology, this work advances the state of practice in anomaly detection for data‐scarce sectors. In shifting from red flags to real solutions, it demonstrates how automated validation can turn detection into action, building trust where data meets water, and enabling more transparent, equitable decisions in global water governance.