DRONE-RL

Dynamic reinforcement learning for online navigation of UAVs in evolving environments

Journal Article (2026)
Author(s)

Noor Khial (Qatar University)

Mhd Saria Allahham (Queen’s University)

Naram Mhaisen (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Loay Ismail (Qatar University)

Mohamed Mabrok (Qatar University)

Amr Mohamed (Qatar University)

Research Group
Networked Systems
DOI related publication
https://doi.org/10.1016/j.knosys.2025.115147 Final published version
More Info
expand_more
Publication Year
2026
Language
English
Research Group
Networked Systems
Journal title
Knowledge-Based Systems
Volume number
334
Article number
115147
Downloads counter
61
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Locating mobile targets in dynamic and cluttered environments, such as disaster zones or adversarial terrains, presents significant challenges due to unknown target mobility and changing environmental conditions. Unmanned Aerial Vehicles (UAVs), equipped with advanced sensing capabilities, offer a viable solution, but require adaptive planning mechanisms to navigate through non-stationary environments effectively. In this paper, we propose a hybrid learning framework for multi-target visitation that combines offline reinforcement learning (RL) and online convex optimization (OCO) to address these challenges. Specifically, we leverage Deep Deterministic Policy Gradient (DDPG) to pre-train various UAV navigation policies across representative scenarios. During deployment, an OCO-based policy selection mechanism adaptively selects the best policy in real-time that ensures responsiveness to environmental changes without retraining. Experimental results demonstrate that our approach consistently adapts to varying levels of non-stationarity and clutter, outperforming benchmark methods in adaptability and mission success. Notably, the online learner exhibits asymptotically vanishing average regret with different levels of non-stationary behaviors.

Files

Taverne
warning

File under embargo until 31-07-2026