Estimate the limit of predictability in short-term traffic forecasting
An entropy-based approach
Guopeng Li (TU Delft - Transport and Planning)
Victor Knoop (TU Delft - Transport and Planning)
J.W.C. van Lint (TU Delft - Transport and Planning)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Accurate short-term traffic forecasting is the cornerstone for Intelligent Transportation Systems. In the past several decades, many models have been proposed to continuously improve the predictive accuracy. A key but unsolved question is whether there is a theoretical bound to the accuracy with which traffic can be predicted and whether that limit can be directly estimated from data. To answer this question, we use core concepts in information theory to derive the limit of predictability in short-term traffic forecasting. Theoretical analysis proves that conditional differential entropy poses a rigorous lower bound of negative-log-likelihood (NLL) for probabilistic models. And the continuous form of Fano's theorem further gives a loose lower bound of mean-square-error (MSE) for deterministic models. Based on the special properties of traffic dynamics, two assumptions are made in the estimate of entropy metrics: cyclostationarity (traffic phenomena show strong periodicity) and localized spatial correlation (due to kinematic wave propagation). They allow formulating the limit of predictability as a function of longitudinal space and time-of-day which finds the most uncertain locations and periods solely from data. Experiments on univariate traffic accumulation forecasting and network-level speed forecasting show that the selected models, including some state-of-the-art deep learning models, indeed cannot outperform the estimated lower bounds but just approach them. The limit of predictability depends on time-of-day, network locations, observation range, and prediction horizon. The results reveal that the stochastic nature of traffic dynamics and improper assumptions on the prior distribution of output are two major factors that restrict the predictive performance. In summary, the proposed method estimates a trustworthy performance boundary for most traffic forecasting models. These conclusions are helpful for further studies in this domain.