Exploring reinforcement learning methods for autonomous sequencing and spacing of aircraft

Master Thesis (2019)
Author(s)

B. Vonk (TU Delft - Aerospace Engineering)

Contributor(s)

JM Hoekstra – Mentor (TU Delft - Control & Simulation)

J Ellerbroek – Graduation committee member (TU Delft - Control & Simulation)

Faculty
Aerospace Engineering
Copyright
© 2019 Bart Vonk
More Info
expand_more
Publication Year
2019
Language
English
Copyright
© 2019 Bart Vonk
Graduation Date
15-04-2019
Awarding Institution
Delft University of Technology
Programme
['Aerospace Engineering | Control & Simulation']
Faculty
Aerospace Engineering
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Research on reinforcement learning algorithms to play complex video games have brought forth controllers surpassing human performance. This paper explores the possibilities of applying these techniques to the sequencing and spacing of aircraft. Two experiments are performed. First a single aircraft must learn to fly a 4D trajectory using only heading commands. To train an agent Duelling Deep Q-Networks has been applied to learn a successful policy, however, learning is unstable and does not provide a suitable basis for extending this to a multi-agent setting. Second, a multi-agent experiment is performed where aircraft have to sequence and space themselves for landing without a 4D constraint. A Bidirectional Communication Net has been trained using Deep Deterministic Policy Gradients first on a single traffic scenario and then on multiple traffic scenarios. Emerging strategies have been seen in the single scenario training e.g. a holding, but no optimal policy was found. Training on multiple traffic scenarios showed no coordination efforts between the aircraft. Further analysis showed the importance of a proper reward function and exploration strategies which were likely the cause of not finding an optimal policy for a multi-agent setting.

Files

Thesis_final.pdf
(pdf | 7.18 Mb)
License info not available