DRONE-RL

None, None; None, None; None, None; None, None; None, None; None, None

DRONE-RL

Dynamic reinforcement learning for online navigation of UAVs in evolving environments

Journal Article (2026)

Author(s)

Noor Khial (Qatar University)

Mhd Saria Allahham (Queen’s University)

Naram Mhaisen (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Loay Ismail (Qatar University)

Mohamed Mabrok (Qatar University)

Amr Mohamed (Qatar University)

Research Group

Networked Systems

Path planning Reinforcement learning Online learning Autonomous unmanned aerial vehicles Through wall target detection

DOI related publication

https://doi.org/10.1016/j.knosys.2025.115147 Final published version

To reference this document use

https://resolver.tudelft.nl/uuid:0e382935-0563-4048-9793-3db02acb91d9

More Info

expand_more

Publication Year

2026

Language

English

Research Group

Networked Systems

Journal title

Knowledge-Based Systems

Volume number

334

Article number

115147

Downloads counter

61

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Locating mobile targets in dynamic and cluttered environments, such as disaster zones or adversarial terrains, presents significant challenges due to unknown target mobility and changing environmental conditions. Unmanned Aerial Vehicles (UAVs), equipped with advanced sensing capabilities, offer a viable solution, but require adaptive planning mechanisms to navigate through non-stationary environments effectively. In this paper, we propose a hybrid learning framework for multi-target visitation that combines offline reinforcement learning (RL) and online convex optimization (OCO) to address these challenges. Specifically, we leverage Deep Deterministic Policy Gradient (DDPG) to pre-train various UAV navigation policies across representative scenarios. During deployment, an OCO-based policy selection mechanism adaptively selects the best policy in real-time that ensures responsiveness to environmental changes without retraining. Experimental results demonstrate that our approach consistently adapts to varying levels of non-stationarity and clutter, outperforming benchmark methods in adaptability and mission success. Notably, the online learner exhibits asymptotically vanishing average regret with different levels of non-stationary behaviors.

Files

1-s2.0-S095070512502180X-main.... (pdf)

(pdf | 4.03 Mb)

Taverne

File under embargo until 31-07-2026