Reinforcement Learning Based Real-time Railway Timetable Scheduling

More Info
expand_more

Abstract

The railway timetable rescheduling problem is a challenging problem in both industry and academia. It is required to calculate a feasible and relatively good timetable within a limited time to reduce the negative impact of disturbances or disruptions. The railway timetable rescheduling problem is typically formulated as a mixed integer linear programming (MILP) problem, which is difficult to solve due to the existence of the integer variables. To address this problem, many optimization-based studies have been conducted. The main advantage of using optimization-based methods is that they are easy to implement and more straightforward. However, the main disadvantage is that most optimization-based methods cannot reach the time requirements for large railway timetable rescheduling problems. There are also some researches using reinforcement learning techniques to solve this problem. By using reinforcement learning, the time requirement could be fulfilled.

In this thesis, an algorithm that combines both reinforcement learning and optimization approaches is proposed to solve the railway timetable rescheduling problem. In the beginning, the reinforcement learning environment is constructed from the railway timetable rescheduling problem. By selecting the independent integer variables as the action, the constraints involving the integer variables are satisfied. After that, a value-based reinforcement learning algorithm is implemented to determine the independent integer variables of the MILP problem. Then, the complete solution of the integer variables could be derived from these independent integer variables. With the solution of integer variables, the MILP problem could be transformed into a linear programming problem, which could be solved efficiently.

Several case studies are conducted in this thesis based on part of the Dutch railway network from Utrecht to 's-Hertogenbosch. The simulation results show that the proposed method makes a great improvement compared with the baseline regarding reducing the total delay of the system. Meanwhile, the reinforcement learning-based method also has an obvious advantage in terms of running time.