Towards Explainable Automation for Air Traffic Control Using Deep Q-learning from Demonstrations and Reward Decomposition

Master Thesis (2021)
Author(s)

M.C. Hermans (TU Delft - Aerospace Engineering)

Contributor(s)

E. van Kampen – Mentor (TU Delft - Control & Simulation)

C. Borst – Mentor (TU Delft - Control & Simulation)

T.M. Monteiro Nunes – Mentor (TU Delft - Control & Simulation)

Faculty
Aerospace Engineering
More Info
expand_more
Publication Year
2021
Language
English
Graduation Date
18-05-2021
Awarding Institution
Delft University of Technology
Programme
['Aerospace Engineering']
Faculty
Aerospace Engineering
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

The current ATC system is seen as the most significant limitation to coping with an increased air traffic density. Transitioning towards an ATC system with a high degree of automation is essential to cope with future traffic demand of the airspace. In recent studies, reinforcement learning has shown promising results automating Conflict Detection and Resolution (CD&R) in Air Traffic Control. The acceptance of automation by Air Traffic Controllers (ATCos) remains a critical limiting factor to its implementation. This work explores how automation can be developed using Deep Q-Learning from Demonstrations (DQfD), which aims to be transparent and conforms with strategies applied by ATCos to increase acceptance of automation. Reward decomposition (RDX) is used to monitor the learning and to understand what the agent has learned. This study focuses on two-aircraft conflicts, in which the state of the controlled and observed aircraft is represented by raw pixel data of the Solution Space Diagram. It was concluded that pre-training on demonstrations speeds up learning and can increase strategic conformance between the solutions provided by the RL agent and the demonstrator. Next to increasing conformance, results also show that DQfD can improve its policy with respect to the suboptimal demonstrations used during training. Finally, RDX has allowed the designer to examine the policy learned by the RL agent in more detail.

Files

License info not available