DRONE-RL
Dynamic reinforcement learning for online navigation of UAVs in evolving environments
Noor Khial (Qatar University)
Mhd Saria Allahham (Queen’s University)
Naram Mhaisen (TU Delft - Electrical Engineering, Mathematics and Computer Science)
Loay Ismail (Qatar University)
Mohamed Mabrok (Qatar University)
Amr Mohamed (Qatar University)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Locating mobile targets in dynamic and cluttered environments, such as disaster zones or adversarial terrains, presents significant challenges due to unknown target mobility and changing environmental conditions. Unmanned Aerial Vehicles (UAVs), equipped with advanced sensing capabilities, offer a viable solution, but require adaptive planning mechanisms to navigate through non-stationary environments effectively. In this paper, we propose a hybrid learning framework for multi-target visitation that combines offline reinforcement learning (RL) and online convex optimization (OCO) to address these challenges. Specifically, we leverage Deep Deterministic Policy Gradient (DDPG) to pre-train various UAV navigation policies across representative scenarios. During deployment, an OCO-based policy selection mechanism adaptively selects the best policy in real-time that ensures responsiveness to environmental changes without retraining. Experimental results demonstrate that our approach consistently adapts to varying levels of non-stationarity and clutter, outperforming benchmark methods in adaptability and mission success. Notably, the online learner exhibits asymptotically vanishing average regret with different levels of non-stationary behaviors.
Files
File under embargo until 31-07-2026