Comparing attention-based methods with long short-term memory for state encoding in reinforcement learning-based separation management

None, None; None, None; None, None

Comparing attention-based methods with long short-term memory for state encoding in reinforcement learning-based separation management

Journal Article (2025)

Author(s)

D. J. Groot (TU Delft - Aerospace Engineering)

J. Ellerbroek (TU Delft - Aerospace Engineering)

J. M. Hoekstra (TU Delft - Aerospace Engineering)

Research Group

Operations & Environment

Deep reinforcement learning Air traffic control Long short-term memory Attention methods Separation managements Soft Actor–Critic

DOI related publication

https://doi.org/10.1016/j.engappai.2025.111592 Final published version

To reference this document use

https://resolver.tudelft.nl/uuid:72c6f563-5333-474e-bb44-a6cd283ee149

More Info

expand_more

Publication Year

2025

Language

English

Research Group

Operations & Environment

Journal title

Engineering Applications of Artificial Intelligence

Volume number

159

Article number

111592

Downloads counter

178

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Reinforcement learning (RL) is a method that has been studied extensively for the task of conflict-resolution and separation management within air traffic control, offering advantages over analytical methods. One key challenge associated with RL for this task is the construction of the input vector. Because the number of agents in the airspace varies, methods that can handle dynamic number of agents are required. Various methods exist, for example, selecting a fixed number of aircraft, or using methods such as recurrent neural networks or attention to encode the information. Multiple studies have shown promising results using these encoder methods, however, studies comparing these methods are limited and the results remain inconclusive on which method works better. To address this issue, this paper compares different input encoding methods: three different attention methods – scaled dot-product, additive and context aware attention – and long short-term memory (LSTM) with three different sorting strategies. These methods are used as input encoders for different models trained with the Soft Actor–Critic algorithm for separation management in high traffic density scenarios. It is found that additive attention is the most effective at increasing the total safety and maximizing path efficiency, outperforming the commonly used scaled dot-product attention and LSTM. Additionally, it is shown that the order of the input sequence significantly impacts the performance of the LSTM based input encoder. This is in contrast with the attention methods, which are sequence-independent and therefore do not suffer from biases introduced by the order of the input sequence.

Files

1-s2.0-S0952197625015945-main.... (pdf)

(pdf | 2.1 Mb)