Dynamic Target Time Management with Reinforcement Learning

A case study on Zurich short-haul regulated arrivals

Master thesis (2024)

Authors

L. Caranti Aerospace Engineering

Contributors

M.J. Ribeiro Air Transport & Operations - Aerospace Engineering (supervisor 1)

Marie Carré (supervisor 2)

Bruno F. Santos Air Transport & Operations - Aerospace Engineering (supervisor 2)

Faculty

Aerospace Engineering

More Info

expand_more

To reference this document use:

http://resolver.tudelft.nl/uuid:a144a873-d51b-4b52-8abb-0c07868f5f92

Published Date

26-01-2024

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Aerospace Engineering

Abstract

This Master Thesis investigates the possible improvements to the Target Time Management concept to optimize the arrival flows for SWISS International Airlines. The aim is to improve operational performance based on the current model used, as well as prove that Target Time Management constitutes a valuable system to improve operations in a dynamic way. To leverage the dynamic nature of slot assignment, an environment model is created and used as training base for two Multi-Agent Reinforcement Learning algorithms. These two algorithms, Soft-Actor Critic (SAC) and Proximal Policy Optimization (PPO), are then tested against the baseline model currently used in operations at SWISS (based on Mixed-Integer Linear Programming). The four domains to measure the algorithms' performance are passenger connecting time, curfew performance, rotation delay and fairness to other airlines. The algorithms were trained in a simulation environment based on statistical representations of the dynamics of the slot allocation system of EUROCONTROL. They were then tested with new data, where they outperformed a MILP implementation in passenger connecting time and rotation delay metrics (curfew and fairness were comparable in magnitude, since the MILP was slightly unfair for SWISS and RL was slightly unfair for other airlines). PPO was then also tested on the real slot assignment environment hosted by EUROCONTROL and once again compared to a MILP approach. Here, it was found that the improvement in critical passenger connecting time was 5.0 minutes for the MILP, and 5.9 minutes for PPO. Rotation delay was improved by 0.9 minutes by the MILP, and by 4.8 minutes by PPO. PPO also made the highest delays higher and the lowest delays lower, which would require EUROCONTROL or SkyGuide representatives to interpret and make conclusions on fairness and safety. Curfew performance was optimal for both methods. In conclusion, it is proven that Reinforcement Learning techniques can aid the dynamicity of decision-making within Target Time Management. It is also proven that Target Time Management with a dynamic decision making approach can improve operational performance compared to a static one.