An online learning framework for UAV search mission in adversarial environments

None, None; None, None; None, None; None, None

An online learning framework for UAV search mission in adversarial environments

Journal Article (2025)

Author(s)

Noor Khial (Qatar University)

N. Mhaisen (TU Delft - Networked Systems, Qatar University)

Mohamed Mabrok (Qatar University)

Amr Mohamed (Qatar University)

Research Group

Networked Systems

DOI related publication

https://doi.org/10.1016/j.eswa.2024.126136

UAV Online learning Human-in-the-loop Experts Multi-armed bandits Search mission

To reference this document use:

https://resolver.tudelft.nl/uuid:b0eb516e-881d-428b-81ec-392cb3868b72

More Info

expand_more

Publication Year

2025

Language

English

Research Group

Networked Systems

Bibliographical Note

Green Open Access added to TU Delft Institutional Repository ‘You share, we take care!’ – Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public. @en

Volume number

267

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

The rapid evolution of Unmanned Aerial Vehicles (UAVs) has revolutionized target search operations in various fields, including military applications, search and rescue missions, and post-disaster management. This paper presents the application of a multi-armed bandit algorithm for UAV search mission. The UAV's mission is to locate a mobile target formation, operating under the assumption of an unknown and potentially non-stationary probability distribution, by learning the formation's strategy over time. To achieve this, we formulate an optimization problem and leverage the Exp3 algorithm (exponential-weighted exploration and exploitation) for its solution. To enhance the learning process, we integrate environment observations as context, resulting in a variant referred to as C-Exp3. However, C-Exp3 is not designed for scenarios where the target formation strategy changes over time. Therefore, AC-Exp3 is proposed as an adaptive solution, featuring a human-centric drift detection mechanism to detect the changes in the formation strategy and adjust the learning process accordingly. Furthermore, the Exp4 algorithm is proposed as a self-adjustment meta-learner to address changes in the formation's strategy. We evaluate the performance of C-Exp3, AC-Exp3, and Exp4 through a series of experiments with a focus on non-stationary environments. Our primary objective is reaching the unknown optimal-in-hindsight policy as the time t approaches the horizon T, thereby reflecting the UAV's capacity to learn formation's strategy. AC-Exp3 demonstrates enhanced adaptability compared to C-Exp3. Meanwhile, Exp4 emerges as a robust performer, swiftly adapting to new strategies.

Files

1-s2.0-S0957417424030033-main.... (pdf)

(pdf | 3.73 Mb)

- Embargo expired in 23-06-2025

License info not available