HER-PDQN: A Reinforcement Learning Approach for UAV Navigation with Hybrid Action Spaces and Sparse Rewards

None, None; None, None

HER-PDQN: A Reinforcement Learning Approach for UAV Navigation with Hybrid Action Spaces and Sparse Rewards

Conference Paper (2022)

Author(s)

Cheng Liu (TU Delft - Control & Simulation)

E. Van Kampen (TU Delft - Control & Simulation)

Research Group

Control & Simulation

Copyright

DOI related publication

https://doi.org/10.2514/6.2022-0793

To reference this document use:

https://resolver.tudelft.nl/uuid:495ad310-4284-486f-bf67-c03a6effff90

More Info

expand_more

Publication Year

2022

Language

English

Copyright

Research Group

Control & Simulation

ISBN (electronic)

978-1-62410-631-6

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Reinforcement learning (RL) equipped with neural networks has recently led to a wide range of successes in learning policies for unmanned aerial vehicle (UAV) navigation and control problems. The success of RL relies on two human-designed heuristics: appropriate action space definition and reward function engineering. The commonly used fully continuous or fully discrete action spaces in optimal control and decision making problems may lack control authority and remove the inherent problem structure, which can negatively affect learning performance. Besides, reward engineering requires a lot of human effort and may lead to unwanted behavior. In this paper, we address these challenges by proposing a new off-policy RL algorithm called HER-PDQN which incorporates Hindsight Experience Replay (HER) with Parameterized Deep Q-Networks (P-DQN). In simulation experiments, HER-PDQN is used to train an agent to fulfill a UAV navigation task in a 2-dimensional environment. The results indicate the effectiveness of P-DQN algorithm in dealing both with the hybrid action space and sparse rewards. This paper can be considered as the first attempt at applying RL in sparse reward setting for UAV navigation with hybrid action spaces.

Files

6.2022_0793.pdf

(pdf | 1.36 Mb)

- Embargo expired in 01-07-2023

License info not available