Towards Explainable Automation for Air Traffic Control Using Deep Q-learning from Demonstrations and Reward Decomposition

More Info
expand_more

Abstract

The current ATC system is seen as the most significant limitation to coping with an increased air traffic density. Transitioning towards an ATC system with a high degree of automation is essential to cope with future traffic demand of the airspace. In recent studies, reinforcement learning has shown promising results automating Conflict Detection and Resolution (CD&R) in Air Traffic Control. The acceptance of automation by Air Traffic Controllers (ATCos) remains a critical limiting factor to its implementation. This work explores how automation can be developed using Deep Q-Learning from Demonstrations (DQfD), which aims to be transparent and conforms with strategies applied by ATCos to increase acceptance of automation. Reward decomposition (RDX) is used to monitor the learning and to understand what the agent has learned. This study focuses on two-aircraft conflicts, in which the state of the controlled and observed aircraft is represented by raw pixel data of the Solution Space Diagram. It was concluded that pre-training on demonstrations speeds up learning and can increase strategic conformance between the solutions provided by the RL agent and the demonstrator. Next to increasing conformance, results also show that DQfD can improve its policy with respect to the suboptimal demonstrations used during training. Finally, RDX has allowed the designer to examine the policy learned by the RL agent in more detail.