When are AI models ready for deployment? Reassessing Google’s global AI flood forecasting system through the lens of responsible modelling
Kailong Li (Desert Research Institute - Las Vegas, University of New South Wales)
Saman Razavi (University of New South Wales, University of Saskatchewan)
Holger R. Maier (University of Adelaide)
Markus Hrachowitz (TU Delft - Surface and Groundwater Hydrology)
Ehsan Nabavi (Australian National University)
Natasha Harvey (Australian National University)
Khaled Akhtar (Alberta Environment and Protected Areas - Edmonton, University of Saskatchewan)
Fisaha Unduche (Province of Manitoba - Transportation and Infrastructure)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
The development of AI models is increasing at a rapid rate. However, when are they ready to be deployed in real-world operational settings? In this paper, we introduce a framework to support such assessments and apply it to Google’s recently released AI-based flood prediction system, which is claimed to achieve “reliability in predicting extreme riverine events” and provide “accurate and timely warnings” that are available “earlier and over larger and more impactful events in ungauged basins”. The system has been integrated into an operational early-warning platform producing open, real-time forecasts in more than 80 countries. While this development promises to usher in a new and exciting age in global flood forecasting, the supporting evidence relies heavily on several subjective choices, the implications of which have not been acknowledged or assessed. Here, we evaluate the consequences of these choices on claims of operational deployment readiness across four dimensions: predictive accuracy, forecast timeliness, the characterization of extreme events, and benchmarking against state-of-the-art models. Our assessment reveals that the system’s actual predictive accuracy is likely to be substantially lower than reported—particularly for extreme events—raising concerns about responsible practices across modelling and publicity in high-stakes applications. The deployment of the Google AI model therefore risks misinforming those who depend on its outputs for evacuation and preparedness decisions, particularly in less-developed countries such as those targeted by the enterprise, given its alarmingly high (>90%) rates of false positives and false negatives. Beyond the immediate operational consequences, if left unaddressed, these outcomes may erode public trust in AI within hydrological sciences. We conclude by calling for greater transparency, accountability, and methodological rigor in the integration of AI into flood forecasting.