Relying on a traffic simulator is often necessary when multiple traffic conditions need to be replicated, either to predict specific quantities of particular interest or to assess more general network properties, such as efficiency or resilience, under different scenarios. While
...
Relying on a traffic simulator is often necessary when multiple traffic conditions need to be replicated, either to predict specific quantities of particular interest or to assess more general network properties, such as efficiency or resilience, under different scenarios. While potentially delivering a high level of fidelity, producing such a simulation may come at a prohibitive computational cost when it is applied in a real context, making this approach unsuitable for real-time applications in most cases. In this regard, the goal of the present work is twofold. Firstly, we aim to surrogate a traffic simulator with a data-driven approach in order to produce real-time traffic predictions that are also sufficiently accurate and effective. Secondly, we want to determine the minimum amount of data, i.e., the smallest number of sensors deployed on the network, that still allow to obtain predictions within a predefined bound. The effectiveness of the approach is evaluated on the full-scale urban network of Rapallo, Italy, in which we employ the AIMSUN NEXT simulator targeting the morning peak hours, i.e. between 7:00 a.m. and 9:00 a.m. In the paper, multiple state-of-the art ML algorithms are tested to assess their effectiveness as surrogate models under the considered problem.