A retrospective assessment of COVID-19 model performance in the USA

Journal Article (2022)
Author(s)

Kyle J. Colonna (Harvard University)

G. F. Nane (TU Delft - Applied Probability)

Ernani F. Choma (Harvard University)

R.M. Cooke (Resources for the Future, TU Delft - Applied Probability)

John S. Evans (Harvard University)

Research Group
Applied Probability
Copyright
© 2022 Kyle J. Colonna, G.F. Nane, Ernani F. Choma, R.M. Cooke, John S. Evans
DOI related publication
https://doi.org/10.1098/rsos.220021
More Info
expand_more
Publication Year
2022
Language
English
Copyright
© 2022 Kyle J. Colonna, G.F. Nane, Ernani F. Choma, R.M. Cooke, John S. Evans
Research Group
Applied Probability
Issue number
10
Volume number
9
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Coronavirus disease 2019 (COVID-19) forecasts from over 100 models are readily available. However, little published information exists regarding the performance of their uncertainty estimates (i.e. probabilistic performance). To evaluate their probabilistic performance, we employ the classical model (CM), an established method typically used to validate expert opinion. In this analysis, we assess both the predictive and probabilistic performance of COVID-19 forecasting models during 2021. We also compare the performance of aggregated forecasts (i.e. ensembles) based on equal and CM performance-based weights to an established ensemble from the Centers for Disease Control and Prevention (CDC). Our analysis of forecasts of COVID-19 mortality from 22 individual models and three ensembles across 49 states indicates that - (i) good predictive performance does not imply good probabilistic performance, and vice versa; (ii) models often provide tight but inaccurate uncertainty estimates; (iii) most models perform worse than a naive baseline model; (iv) both the CDC and CM performance-weighted ensembles perform well; but (v) while the CDC ensemble was more informative, the CM ensemble was more statistically accurate across states. This study presents a worthwhile method for appropriately assessing the performance of probabilistic forecasts and can potentially improve both public health decision-making and COVID-19 modelling.