Conflict Prioritization with Multi-Agent Deep Reinforcement Learning

Master Thesis (2022)
Author(s)

D.J.G. Cuppen (TU Delft - Aerospace Engineering)

Contributor(s)

J Ellerbroek – Mentor (TU Delft - Control & Simulation)

Jacco Hoekstra – Mentor (TU Delft - Control & Simulation)

M.J. Ribeiro – Mentor (TU Delft - Control & Simulation)

Faculty
Aerospace Engineering
Copyright
© 2022 Daan Cuppen
More Info
expand_more
Publication Year
2022
Language
English
Copyright
© 2022 Daan Cuppen
Graduation Date
04-07-2022
Awarding Institution
Delft University of Technology
Programme
['Aerospace Engineering']
Faculty
Aerospace Engineering
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

To facilitate an increase in air traffic volume and to allow for more flexibility in the flight paths of aircraft, an abundance of decentralized conflict resolution (CR) algorithms have been developed. The efficiency of such algorithms often deteriorates when employed in high traffic densities. Several methods have tried to prioritize certain conflicts to alleviate part of the problems introduced at high traffic densities. However, manually establishing rules for prioritizing intruders is a difficult task due to the complex traffic patterns that emerge in multi-actor conflicts. Reinforcement Learning (RL) has demonstrated its ability to synthesize strategies while approximating the system dynamics. This research shows how RL can be employed to improve conflict prioritization in multi-actor conflicts. We employ the Proximal Policy Optimization algorithm with an actor-critic network. The RL model decides on intruder selection based on the local observations of an aircraft. It was trained on a limited number of conflict geometries in which it was able to significantly reduce the number of intrusions. A conflict prioritization strategy was then formulated based on the decisions taken by the RL model during training. We show that the efficacy of a conflict resolution algorithm that adopts a global solution, the solution space diagram (SSD) in this research, can be improved when utilizing this conflict prioritization strategy. Finally, these results were compared to the performance of a pairwise CR method, the Modified Voltage Potential (MVP). Even though MVP resulted in a smaller number of intrusions compared to SSD with conflict prioritization, the prioritization strategy did reduce the gap between the two CR methods.

Files

Report_Daan_Cuppen.pdf
(pdf | 64.1 Mb)
License info not available