HER-PDQN: A Reinforcement Learning Approach for UAV Navigation with Hybrid Action Spaces and Sparse Rewards

More Info
expand_more

Abstract

Reinforcement learning (RL) equipped with neural networks has recently led to a wide range of successes in learning policies for unmanned aerial vehicle (UAV) navigation and control problems. The success of RL relies on two human-designed heuristics: appropriate action space definition and reward function engineering. The commonly used fully continuous or fully discrete action spaces in optimal control and decision making problems may lack control authority and remove the inherent problem structure, which can negatively affect learning performance. Besides, reward engineering requires a lot of human effort and may lead to unwanted behavior. In this paper, we address these challenges by proposing a new off-policy RL algorithm called HER-PDQN which incorporates Hindsight Experience Replay (HER) with Parameterized Deep Q-Networks (P-DQN). In simulation experiments, HER-PDQN is used to train an agent to fulfill a UAV navigation task in a 2-dimensional environment. The results indicate the effectiveness of P-DQN algorithm in dealing both with the hybrid action space and sparse rewards. This paper can be considered as the first attempt at applying RL in sparse reward setting for UAV navigation with hybrid action spaces.