Railway systems suffer from disturbances in operations, such as extended section running times caused by temporary speed restrictions and prolonged dwell times at stations due to unexpected passenger volumes. These disturbances cause deviations from the original timetable and neg
...
Railway systems suffer from disturbances in operations, such as extended section running times caused by temporary speed restrictions and prolonged dwell times at stations due to unexpected passenger volumes. These disturbances cause deviations from the original timetable and negatively impact service reliability and passenger experience. Effective and timely rescheduling measures are crucial in reducing the impact of these disturbances. Existing timetable rescheduling models that rely on optimization-based methods often struggle with computational inefficiency, especially when dealing with scenarios involving a large number of train services. To address these challenges, we propose a learning-based timetable rescheduling framework that considers scalability in its formulation to reduce the growing computational burden associated with an increasing number of train services. The proposed framework decomposes the complex rescheduling problem into multiple subtasks, facilitating a systematic approach to managing extensive railway networks with numerous stations and train services. A high-level agent, functioning as a centralized traffic controller, is responsible for decomposing the overall deviation reduction task into subtasks at a low level and assigning them to individual train services with the primary objective of minimizing the time required to restore the original timetable. Low-level agents, acting as distributed train dispatchers, are tasked with rescheduling the timetables of their assigned trains. These low-level agents employ various dispatching strategies, facilitated by inter-train communication, to search for optimal rescheduling solutions while adhering to operational constraints such as minimum headway requirements. The lowlevel agents utilize an actor-critic architecture to generate continuous control decisions for dwell and running times, enabling them to learn and optimize their performance. Knowledge-sharing mechanisms amongst the low-level agents enable faster and more robust learning. Furthermore, advanced exploration methods are integrated to enhance the efficiency of the agents' training process.