Evaluating the generalizability and transferability of water distribution deterioration models

Journal Article (2023)
Author(s)

Shamsuddin Daulat (Norwegian University of Science and Technology (NTNU))

Marius Møller Rokstad (Norwegian University of Science and Technology (NTNU))

Stian Bruaset (SINTEF Industry)

J.G. Langeveld (TU Delft - Sanitary Engineering)

Franz Tscheikner-Gratl (Norwegian University of Science and Technology (NTNU))

Research Group
Sanitary Engineering
Copyright
© 2023 Shamsuddin Daulat, Marius Møller Rokstad, Stian Bruaset, J.G. Langeveld, F. Tscheikner-Gratl
DOI related publication
https://doi.org/10.1016/j.ress.2023.109611
More Info
expand_more
Publication Year
2023
Language
English
Copyright
© 2023 Shamsuddin Daulat, Marius Møller Rokstad, Stian Bruaset, J.G. Langeveld, F. Tscheikner-Gratl
Research Group
Sanitary Engineering
Volume number
241
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Small utilities often lack the required amount of data to train machine learning-based models to predict pipe failures, and hence are unable to harness the possibilities and predictive power of machine learning. This study evaluates the generalizability and transferability of a machine learning model to see if small utilities can benefit from the data and models of other utilities. Using nine Norwegian utilities’ datasets, we trained nine global models (by merging multiple datasets) and nine local models (by utilizing each utility's dataset) using random survival forest. Several pre-processing techniques including addressing left-truncated break data and break data scarcity are also presented. The global models and three of the local models were tested to predict the pipe failure of the utilities which were not included in their training datasets. The results indicate that the global models can predict other utilities with sufficient accuracy while local models have some limitations. However, if a representative utility with a sufficiently large (and information rich) dataset is selected, its model can predict the other utility's pipe breaks as accurate as the global models. Furthermore, survival curves for defined cohorts as proxies for uncertainty, and variable importance show that pipes with and without previous breaks behave extremely different. With the understanding of models’ generalizability and transferability, small utilities can benefit from the data and models of other utilities.