Exploring reinforcement learning methods for autonomous sequencing and spacing of aircraft

None, None

Exploring reinforcement learning methods for autonomous sequencing and spacing of aircraft

Master Thesis (2019)

Author(s)

B. Vonk (TU Delft - Aerospace Engineering)

Contributor(s)

JM Hoekstra – Mentor (TU Delft - Control & Simulation)

J Ellerbroek – Graduation committee member (TU Delft - Control & Simulation)

Faculty

Aerospace Engineering

Copyright

Reinforcement Learning Air Traffic Control Deep Deterministic Policy Gradients Autonomous Control BlueSky Sequencing and Spacing

To reference this document use:

https://resolver.tudelft.nl/uuid:2e776b60-cd4e-4268-93e3-3fcc81cd794f

More Info

expand_more

Publication Year

2019

Language

English

Copyright

Graduation Date

15-04-2019

Awarding Institution

Delft University of Technology

Programme

['Aerospace Engineering | Control & Simulation']

Faculty

Aerospace Engineering

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Research on reinforcement learning algorithms to play complex video games have brought forth controllers surpassing human performance. This paper explores the possibilities of applying these techniques to the sequencing and spacing of aircraft. Two experiments are performed. First a single aircraft must learn to fly a 4D trajectory using only heading commands. To train an agent Duelling Deep Q-Networks has been applied to learn a successful policy, however, learning is unstable and does not provide a suitable basis for extending this to a multi-agent setting. Second, a multi-agent experiment is performed where aircraft have to sequence and space themselves for landing without a 4D constraint. A Bidirectional Communication Net has been trained using Deep Deterministic Policy Gradients first on a single traffic scenario and then on multiple traffic scenarios. Emerging strategies have been seen in the single scenario training e.g. a holding, but no optimal policy was found. Training on multiple traffic scenarios showed no coordination efforts between the aircraft. Further analysis showed the importance of a proper reward function and exploration strategies which were likely the cause of not finding an optimal policy for a multi-agent setting.

Files

Thesis_final.pdf

(pdf | 7.18 Mb)

License info not available