This paper proposes the Learning to Platforming (L2P) method, a novel graph neural network based deep reinforcement learning method, to solve the Train Platforming and Rescheduling Problem (TPRP). We customize a Markov decision process (MDP) to formulate the solving process of TP
...
This paper proposes the Learning to Platforming (L2P) method, a novel graph neural network based deep reinforcement learning method, to solve the Train Platforming and Rescheduling Problem (TPRP). We customize a Markov decision process (MDP) to formulate the solving process of TPRP, utilizing a graph structure to represent states of trains, routes, and berthing tracks from a microscopic perspective. Then, we design a hybrid graph neural network named hAI-GNN to learn informative node embeddings on the graph encoded state. These embeddings are utilized to derive an effective action from the lightweight action space of MDP, which is associated with the decision object train under the state. A discrete-event simulation model is employed to serve as the environment of MDP and implement state transition mechanism. The hAI-GNN based policy network is trained by the Proximal Policy Optimization (PPO) algorithm with the reward function designed to minimize total knock-on delay trains and platform changes. The experiments on real-world instances show that the proposed L2P method can obtain high-quality solutions for both small and large scale instances within very short solving times.