Mhd Saria Allahham | TU Delft Repository

DRONE-RL

Dynamic reinforcement learning for online navigation of UAVs in evolving environments

Journal article (2026) - Noor Khial, Mhd Saria Allahham, Naram Mhaisen, Loay Ismail, Mohamed Mabrok, Amr Mohamed

Locating mobile targets in dynamic and cluttered environments, such as disaster zones or adversarial terrains, presents significant challenges due to unknown target mobility and changing environmental conditions. Unmanned Aerial Vehicles (UAVs), equipped with advanced sensing capabilities, offer a viable solution, but require adaptive planning mechanisms to navigate through non-stationary environments effectively. In this paper, we propose a hybrid learning framework for multi-target visitation that combines offline reinforcement learning (RL) and online convex optimization (OCO) to address these challenges. Specifically, we leverage Deep Deterministic Policy Gradient (DDPG) to pre-train various UAV navigation policies across representative scenarios. During deployment, an OCO-based policy selection mechanism adaptively selects the best policy in real-time that ensures responsiveness to environmental changes without retraining. Experimental results demonstrate that our approach consistently adapts to varying levels of non-stationarity and clutter, outperforming benchmark methods in adaptability and mission success. Notably, the online learner exhibits asymptotically vanishing average regret with different levels of non-stationary behaviors. ...

On Designing Smart Agents for Service Provisioning in Blockchain-Powered Systems

Journal article (2022) - Naram Mhaisen, Mhd Saria Allahham, Amr Mohamed, Aiman Erbad, Mohsen Guizani

Service provisioning systems assign users to service providers according to allocation criteria that strike an optimal trade-off between users' Quality of Experience (QoE) and the operation cost endured by providers. These systems have been leveraging Smart Contracts (SCs) to add trust and transparency to their criteria. However, deploying fixed allocation criteria in SCs does not necessarily lead to the best performance over time since the blockchain participants join and leave flexibly, and their load varies with time, making the original allocation sub-optimal. Furthermore, updating the criteria manually at every variation in the blockchain jeopardizes the autonomous and independent execution promised by SCs. Thus, we propose a set of light-weight agents for SCs that are capable of optimizing the performance. We also propose using online learning SCs, empowered by Deep Reinforcement Learning (DRL) agent, that leverage the chained data to continuously self-tune its allocation criteria. We show that the proposed learning-assisted method achieves superior performance on the combinatorial multi-stage allocation problem while still being executable in real-time. We also compare the proposed approach with standard heuristics as well as planning methods. Results show a significant performance advantage over heuristics and better adaptability to the dynamic nature of blockchain networks. ...

Multi-Agent Reinforcement Learning for Network Selection and Resource Allocation in Heterogeneous Multi-RAT Networks

Journal article (2022) - Mhd Saria Allahham, Alaa Awad Abdellatif, Naram Mhaisen, Amr Mohamed, Aiman Erbad, Mohsen Guizani

The rapid production of mobile devices along with the wireless applications boom is continuing to evolve daily. This motivates the exploitation of wireless spectrum using multiple Radio Access Technologies (multi-RAT) and developing innovative network selection techniques to cope with such intensive demand while improving Quality of Service (QoS). Thus, we propose a distributed framework for dynamic network selection at the edge level, and resource allocation at the Radio Access Network (RAN) level, while taking into consideration diverse applications' characteristics. In particular, our framework employs a deep Multi-Agent Reinforcement Learning (DMARL) algorithm, that aims to maximize the edge nodes' quality of experience while extending the battery lifetime of the nodes and leveraging adaptive compression schemes. Indeed, our framework enables data transfer from the network's edge nodes, with multi-RAT capabilities, to the cloud in a cost and energy-efficient manner, while maintaining QoS requirements of different supported applications. Our results depict that our solution outperforms state-of-the-art techniques of network selection in terms of energy consumption, latency, and cost. ...