Multi-agent reinforcement learning for radar waveform design

None, None

Multi-agent reinforcement learning for radar waveform design

Master Thesis (2024)

Author(s)

R. Gaghi (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Elvin Isufi – Mentor (TU Delft - Multimedia Computing)

Francesco Fioranelli – Graduation committee member (TU Delft - Microwave Sensing, Signals & Systems)

Mario Alberto Coutiño Minguez – Mentor (TNO)

Faculty

Electrical Engineering, Mathematics and Computer Science

Reinforcement Learning Radar Multi Agent Reinforcement Learning Deep Learning GNN Waveform design

To reference this document use:

https://resolver.tudelft.nl/uuid:c5f8d40b-0035-4ec5-8834-863e451d0c0f

More Info

expand_more

Publication Year

2024

Language

English

Graduation Date

19-07-2024

Awarding Institution

Delft University of Technology

Programme

['Computer Science | Multimedia Computing']

Abstract

This thesis investigates the application of multi-agent reinforcement learning (MARL) to the optimization of radar waveforms. Radar technology is crucial in fields such as aviation, maritime navigation, and defense, but faces challenges such as interference, clutter, and the need for high resolution and accuracy. Cognitive radar, which adapts to environmental changes in real-time, offers a promising solution. This research aims to explore the potential of MARL in optimizing radar waveforms and examines whether incorporating domain knowledge can enhance performance.

The radar waveform optimization problem is framed within the Decentralized Partially Observable Markov Decision Process (Dec-POMDP) framework, defining the radar environment, agents' observations and actions, and reward functions. The study experiments with different architectures, including decentralized actors with a centralized critic. The centralized critic, having access to global state information, helps stabilize the learning process and mitigate non-stationarity and credit assignment problems. The use of GNNs as a centralized critic is proposed to leverage graph data sparsity, enhancing scalability.

The proposed models are trained and tested in a radar-tracking scenario, evaluated in terms of Pareto optimality and optimization times. The results show that both Independent Actor-Critic (IAC) and Independent Actor with Centralized Critic (IACC) models outperform traditional methods in terms of probability of detection, waveform duration, and optimization speed. The findings highlight the effectiveness of MARL approaches in optimizing radar waveforms, emphasizing the benefits of centralized critics for robustness and coordination. However, the choice of architecture significantly impacts performance, and while GNNs offer potential scalability advantages, their integration of domain knowledge did not yield significant improvements in this study. This research lays a foundation for future exploration of MARL and GNNs in radar waveform optimization.

Files

MARL_for_Waveform_Design.pdf

(pdf | 4.92 Mb)

License info not available