Subtask-masked curriculum learning for reinforcement learning with application to UAV maneuver decision-making

None, None; None, None; None, None; None, None; None, None

Subtask-masked curriculum learning for reinforcement learning with application to UAV maneuver decision-making

Journal Article (2023)

Author(s)

Yueqi Hou (Air Force Engineering University China)

Xiaolong Liang (Air Force Engineering University China)

Maolong Lv (Air Force Engineering University China)

Qisong Yang (TU Delft - Algorithmics)

Yang Li (TU Delft - Algorithmics)

Research Group

Algorithmics

DOI related publication

https://doi.org/10.1016/j.engappai.2023.106703

Reinforcement learning Unmanned Aerial Vehicle Knowledge transfer Curriculum learning Maneuver decision-making

To reference this document use:

https://resolver.tudelft.nl/uuid:585edba5-8a2d-4f03-982e-092780c84a42

More Info

expand_more

Publication Year

2023

Language

English

Research Group

Algorithmics

Volume number

125

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Unmanned Aerial Vehicle (UAV) maneuver strategy learning remains a challenge when using Reinforcement Learning (RL) in this sparse reward task. In this paper, we propose Subtask-Masked curriculum learning for RL (SUBMAS-RL), an efficient RL paradigm that implements curriculum learning and knowledge transfer for UAV maneuver scenarios involving multiple missiles. First, this study introduces a novel concept known as subtask mask to create source tasks from a target task by masking partial subtasks. Then, a subtask-masked curriculum generation method is proposed to generate a sequenced curriculum by alternately conducting task generation and task sequencing. To establish efficient knowledge transfer and avoid negative transfer, this paper employs two transfer techniques, policy distillation and policy reuse, along with an explicit transfer condition that masks irrelevant knowledge. Experimental results demonstrate that our method achieves a 94.8% success rate in the UAV maneuver scenario, where the direct use of reinforcement learning always fails. The proposed RL framework SUBMAS-RL is expected to learn an effective policy in complex tasks with sparse rewards.

Files

1_s2.0_S0952197623008874_main.... (pdf)

(pdf | 2.35 Mb)

- Embargo expired in 01-01-2024

License info not available