Reinforcement Learning in Railway Delay Management

More Info
expand_more

Abstract

The railway systems are affected by unexpected disturbances on a daily basis causing delays to trains and passengers. Real-time traffic management is necessary and is currently handled by human traffic controllers who mainly focus on minimizing train delays. In the past two decades, extensive research has been done to improve railway traffic management by shifting the focus to passengers (generally called delay management) using optimization or heuristics. The existing optimization-based models are difficult to solve efficiently. The existing heuristics perform well in terms of computational efficiency but are based on simplified passenger behaviours. In this paper, we propose an efficient method including complex passenger behaviours. That is a Reinforcement Learning (RL) framework for delay management (at a network-level for mainline railways) aiming to minimize passenger destination delays. In our method, passengers re-plan their travels when actually miss their transfers (i.e. reactive behaviour) or when they are aware of better path choices than their current planned ones (i.e. proactive behaviour), whichever happens first. The re-plan behaviour is allowed to happen multiple times during a single journey of a passenger as long as it is beneficial. We tested the proposed RL approach on a real-world railway network and compared it to three benchmarks: the first-in-first-out (FIFO) dispatching rule that is widely adopted in practice, and the train-centric and passenger-centric optimization models (TcOM and PcOM) on timetable rescheduling. Results show the ability of our RL approach to obtain better rescheduling solutions (in terms of total passenger delays) than the FIFO and TcOM, and better computational efficiency (43 seconds on average after training) than the PcOM (4572 seconds on average). With sufficient training, 74% of the trained RL agents can solve all other new delay scenarios designed in our case study, implying good generalization performance of the proposed method.