Learning to reschedule platforms

A graph neural network based deep reinforcement learning method for the train platforming and rescheduling problem

Journal Article (2026)
Author(s)

Hongxiang Zhang (Southwest Jiaotong University)

Andrea D’Ariano (University of Roma Tre)

Yongqiu Zhu (TU Delft - Transport, Mobility and Logistics)

Yaoxin Wu (Eindhoven University of Technology)

Liuyang Hu (Southwest Jiaotong University)

Gongyuan Lu (Southwest Jiaotong University)

Research Group
Transport, Mobility and Logistics
DOI related publication
https://doi.org/10.1016/j.trc.2025.105453
More Info
expand_more
Publication Year
2026
Language
English
Research Group
Transport, Mobility and Logistics
Volume number
183
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

The train platforming schedule is the crucial plan for guiding trains to travel through a railway station without spatial and temporal conflicts. When trains are delayed in arriving at the station due to disturbances or disruptions, it raises the Train Platforming and Rescheduling Problem (TPRP), one of the hot topics in railway traffic management. It focuses on allocating platforms and time slots for trains to reduce delays and ensure operational efficiency in a station. This paper introduces a novel graph neural network based deep reinforcement learning method to address this problem, named Learning to Reschedule Platforms (L2RP). We formulate the solving process of TPRP as a customized Markov decision process. Meanwhile, we integrate a microscopic discrete-event train operation simulation model to serve as the agent exploration environment, which provides states, executes actions, and completes transitions. Then, we design a hybrid graph neural network based policy network to derive high-quality actions under each graph encoded state.The policy network is trained with the reward function designed to minimize total train knock-on delays and platform changes. The experiments on real-world instances show that the proposed L2RP method can produce high-quality solutions for instances of various scenarios within stably short solving times.

Files

Taverne
warning

File under embargo until 22-05-2026