EO

E.H.Q. Oosthoek

info

Please Note

2 records found

A Deep Reinforcement Learning Framework for the Aircraft Recovery Problem: A Comparative Analysis of Proactive and Reactive Strategies focussing on the State-Space and Reward Formulations

Master thesis (2026) - E.H.Q. Oosthoek, M.J. Ribeiro
With a rising demand for air travel, Airline Disruption Management (ADM) ensures successful schedule recovery in the event of disruptions. The Aircraft Recovery Problem or ARP, part of ADM, solely focusses on aircraft. Previous research has concentrated on exact optimisation, as well as simple-, meta-, or hybrid-heuristic solution methods. However, in order to prevent significant delays, resolution decisions must be made fast. This need combined with the rise of deep learning, has led to the emergence of deep reinforcement learning (DRL) as a viable solution strategy. Nevertheless, the performance of DRL remained often limited to specific state space formulations and reward designs.
In order to close this gap, the primary objective of this work is to further optimise a reinforcement learning (RL) formulation for the aircraft recovery problem (ARP) while minimising disruption effects. It investigates and compares two models with alternate state space formulations. First, we test a single, aircraft-centric and continuous design. Second, we presents a dual, sparse, flight-centric, and primarily binary formulation. Each model compares computational efficiency, action distribution, and conflict resolution effectiveness across three DRL environments; proactive, reactive, and myopic, subject to different levels of stochastic state information. It was found that the state space formulation significantly influences computation time, which is a prominent issue faced by big action- and state space sizes. Furthermore, it is shown that proactive environments result in better conflict resolution.
However, significant challenges of the model were revealed by the unexpected negative learning trend. This counterintuitive result was further underlined by the notably higher performance during exploration than during exploitation, indicating the DRL agent’s inability to learn an optimal policy. Finally, sensitivity analyses of the reward and a hyperparameter underlined the high susceptibility of RL to minor parameter tweaks, stressing the challenging implementation of DRL models for real-life applications. ...