Dynamic Target Time Management with Reinforcement Learning

A case study on Zurich short-haul regulated arrivals

More Info


This Master Thesis investigates the possible improvements to the Target Time Management concept to optimize the arrival flows for SWISS International Airlines. The aim is to improve operational performance based on the current model used, as well as prove that Target Time Management constitutes a valuable system to improve operations in a dynamic way. To leverage the dynamic nature of slot assignment, an environment model is created and used as training base for two Multi-Agent Reinforcement Learning algorithms. These two algorithms, Soft-Actor Critic (SAC) and Proximal Policy Optimization (PPO), are then tested against the baseline model currently used in operations at SWISS (based on Mixed-Integer Linear Programming). The four domains to measure the algorithms' performance are passenger connecting time, curfew performance, rotation delay and fairness to other airlines. The algorithms were trained in a simulation environment based on statistical representations of the dynamics of the slot allocation system of EUROCONTROL. They were then tested with new data, where they outperformed a MILP implementation in passenger connecting time and rotation delay metrics (curfew and fairness were comparable in magnitude, since the MILP was slightly unfair for SWISS and RL was slightly unfair for other airlines). PPO was then also tested on the real slot assignment environment hosted by EUROCONTROL and once again compared to a MILP approach. Here, it was found that the improvement in critical passenger connecting time was 5.0 minutes for the MILP, and 5.9 minutes for PPO. Rotation delay was improved by 0.9 minutes by the MILP, and by 4.8 minutes by PPO. PPO also made the highest delays higher and the lowest delays lower, which would require EUROCONTROL or SkyGuide representatives to interpret and make conclusions on fairness and safety. Curfew performance was optimal for both methods. In conclusion, it is proven that Reinforcement Learning techniques can aid the dynamicity of decision-making within Target Time Management. It is also proven that Target Time Management with a dynamic decision making approach can improve operational performance compared to a static one.