An online learning framework for UAV search mission in adversarial environments
Noor Khial (Qatar University)
N. Mhaisen (TU Delft - Networked Systems, Qatar University)
Mohamed Mabrok (Qatar University)
Amr Mohamed (Qatar University)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
The rapid evolution of Unmanned Aerial Vehicles (UAVs) has revolutionized target search operations in various fields, including military applications, search and rescue missions, and post-disaster management. This paper presents the application of a multi-armed bandit algorithm for UAV search mission. The UAV's mission is to locate a mobile target formation, operating under the assumption of an unknown and potentially non-stationary probability distribution, by learning the formation's strategy over time. To achieve this, we formulate an optimization problem and leverage the Exp3 algorithm (exponential-weighted exploration and exploitation) for its solution. To enhance the learning process, we integrate environment observations as context, resulting in a variant referred to as C-Exp3. However, C-Exp3 is not designed for scenarios where the target formation strategy changes over time. Therefore, AC-Exp3 is proposed as an adaptive solution, featuring a human-centric drift detection mechanism to detect the changes in the formation strategy and adjust the learning process accordingly. Furthermore, the Exp4 algorithm is proposed as a self-adjustment meta-learner to address changes in the formation's strategy. We evaluate the performance of C-Exp3, AC-Exp3, and Exp4 through a series of experiments with a focus on non-stationary environments. Our primary objective is reaching the unknown optimal-in-hindsight policy as the time t approaches the horizon T, thereby reflecting the UAV's capacity to learn formation's strategy. AC-Exp3 demonstrates enhanced adaptability compared to C-Exp3. Meanwhile, Exp4 emerges as a robust performer, swiftly adapting to new strategies.