Reporting delays: A widely neglected impact factor in COVID-19 forecasts

More Info


Epidemic forecasts are only as good as the accuracy of epidemic measurements. Is epidemic data, particularly COVID-19 epidemic data, clean, and devoid of noise? The complexity and variability inherent in data collection and reporting suggest otherwise. While we cannot evaluate the integrity of the COVID-19 epidemic data in a holistic fashion, we can assess the data for the presence of reporting delays. In our work, through the analysis of the first COVID-19 wave, we find substantial reporting delays in the published epidemic data. Motivated by the desire to enhance epidemic forecasts, we develop a statistical framework to detect, uncover, and remove reporting delays in the infectious, recovered, and deceased epidemic time series. Using our framework, we expose and analyze reporting delays in eight regions significantly affected by the first COVID-19 wave. Further, we demonstrate that removing reporting delays from epidemic data by using our statistical framework may decrease the error in epidemic forecasts. While our statistical framework can be used in combination with any epidemic forecast method that intakes infectious, recovered, and deceased data, to make a basic assessment, we employed the classical SIRD epidemic model. Our results indicate that the removal of reporting delays from the epidemic data may decrease the forecast error by up to 50%. We anticipate that our framework will be indispensable in the analysis of novel COVID-19 strains and other existing or novel infectious diseases.