Towards Explainable Automation for Air Traffic Control Using Deep Q-learning from Demonstrations and Reward Decomposition

Hermans, M.C.

Towards Explainable Automation for Air Traffic Control Using Deep Q-learning from Demonstrations and Reward Decomposition

Master thesis (2021)

Authors

M.C. Hermans Aerospace Engineering

Contributors

EJ Van Kampen (mentor)

Clark Borst (mentor)

T.M. Monteiro Nunes (mentor)

Faculty

Aerospace Engineering, Aerospace Engineering

Reinforcement Learning Air Traffic Control Strategic Conformance Solution Space Diagram Decision Support Systems Conflict Detection & Resolution (CD&R) BlueSky Deep Q-learning from Demonstrations Reward Decomposition Acceptance of Automation

To reference this document use:

http://resolver.tudelft.nl/uuid:7f999fe6-efe6-413f-b551-a07aadf674bd

More Info

expand_more

Published Date

18-05-2021

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Aerospace Engineering

Abstract

The current ATC system is seen as the most significant limitation to coping with an increased air traffic density. Transitioning towards an ATC system with a high degree of automation is essential to cope with future traffic demand of the airspace. In recent studies, reinforcement learning has shown promising results automating Conflict Detection and Resolution (CD&R) in Air Traffic Control. The acceptance of automation by Air Traffic Controllers (ATCos) remains a critical limiting factor to its implementation. This work explores how automation can be developed using Deep Q-Learning from Demonstrations (DQfD), which aims to be transparent and conforms with strategies applied by ATCos to increase acceptance of automation. Reward decomposition (RDX) is used to monitor the learning and to understand what the agent has learned. This study focuses on two-aircraft conflicts, in which the state of the controlled and observed aircraft is represented by raw pixel data of the Solution Space Diagram. It was concluded that pre-training on demonstrations speeds up learning and can increase strategic conformance between the solutions provided by the RL agent and the demonstrator. Next to increasing conformance, results also show that DQfD can improve its policy with respect to the suboptimal demonstrations used during training. Finally, RDX has allowed the designer to examine the policy learned by the RL agent in more detail.

Files

Final_thesis_Max_Hermans_30_04... (pdf)

(pdf | 10.9 Mb)

License info not available