An Online Learning Framework for UAV Target Search Missions in Non-Stationary Environments

None, None; None, None; None, None; None, None

An Online Learning Framework for UAV Target Search Missions in Non-Stationary Environments

Conference Paper (2024)

Author(s)

Noor Khial (Qatar University)

Naram Mhaisen (TU Delft - Networked Systems)

Mohamed Mabrok (Qatar University)

Amr Mohamed (Qatar University)

Research Group

Networked Systems

DOI related publication

https://doi.org/10.1109/CCECE59415.2024.10667171

UAV Online Learning Multi-Armed Bandits Search Mission

To reference this document use:

https://resolver.tudelft.nl/uuid:7b296a9b-8a49-41ea-96f1-02b8e41e4cf7

More Info

expand_more

Publication Year

2024

Language

English

Research Group

Networked Systems

Pages (from-to)

753-758

ISBN (print)

979-8-3503-7163-5

ISBN (electronic)

979-8-3503-7162-8

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

The rapid evolution of Unmanned Aerial Vehicles (UAVs) has revolutionized target search operations in various fields, including military applications, search and rescue missions, and post-disaster management. In this paper, we propose the use of a multi-armed bandit algorithm for a UAV's search mission in an unknown and adversarial setting. The UAV's objective is to locate a mobile target formation, assuming that their mobility resembles an adversarial behavior. To achieve this, we formulate an optimization problem and leverage the Exp3 (exponential-weighted exploration and exploitation) algorithm to solve it. The targets are assumed to be moving under the assumption of an unknown and potentially non-stationary probability distribution. To enhance the learning process, we integrate environmental observations as contextual information, resulting in a variant called C-Exp3, which optimizes the search process. Finally, we evaluate the performance of C-Exp3 in UAV search missions, focusing on adversarial environments. The primary objective for the UAV is to converge towards an optimal policy as time t approaches the horizon T, reflecting the UAV's capacity to learn the formation's strategy.

Files

An_Online_Learning_Framework_f... (pdf)

(pdf | 3.94 Mb)

- Embargo expired in 12-03-2025

License info not available