Conflict Prioritization with Multi-Agent Deep Reinforcement Learning

None, None

Conflict Prioritization with Multi-Agent Deep Reinforcement Learning

Master Thesis (2022)

Author(s)

D.J.G. Cuppen (TU Delft - Aerospace Engineering)

Contributor(s)

J Ellerbroek – Mentor (TU Delft - Control & Simulation)

Jacco Hoekstra – Mentor (TU Delft - Control & Simulation)

M.J. Ribeiro – Mentor (TU Delft - Control & Simulation)

Faculty

Aerospace Engineering

Copyright

To reference this document use:

https://resolver.tudelft.nl/uuid:02c5d5a2-a203-46f9-8f75-1be017d5675e

More Info

expand_more

Publication Year

2022

Language

English

Copyright

Graduation Date

04-07-2022

Awarding Institution

Delft University of Technology

Programme

['Aerospace Engineering']

Faculty

Aerospace Engineering

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

To facilitate an increase in air traffic volume and to allow for more flexibility in the flight paths of aircraft, an abundance of decentralized conflict resolution (CR) algorithms have been developed. The efficiency of such algorithms often deteriorates when employed in high traffic densities. Several methods have tried to prioritize certain conflicts to alleviate part of the problems introduced at high traffic densities. However, manually establishing rules for prioritizing intruders is a difficult task due to the complex traffic patterns that emerge in multi-actor conflicts. Reinforcement Learning (RL) has demonstrated its ability to synthesize strategies while approximating the system dynamics. This research shows how RL can be employed to improve conflict prioritization in multi-actor conflicts. We employ the Proximal Policy Optimization algorithm with an actor-critic network. The RL model decides on intruder selection based on the local observations of an aircraft. It was trained on a limited number of conflict geometries in which it was able to significantly reduce the number of intrusions. A conflict prioritization strategy was then formulated based on the decisions taken by the RL model during training. We show that the efficacy of a conflict resolution algorithm that adopts a global solution, the solution space diagram (SSD) in this research, can be improved when utilizing this conflict prioritization strategy. Finally, these results were compared to the performance of a pairwise CR method, the Modified Voltage Potential (MVP). Even though MVP resulted in a smaller number of intrusions compared to SSD with conflict prioritization, the prioritization strategy did reduce the gap between the two CR methods.

Files

Report_Daan_Cuppen.pdf

(pdf | 64.1 Mb)

License info not available