Multi-agent reinforcement learning for radar waveform design

Master Thesis (2024)
Author(s)

R. Gaghi (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Elvin Isufi – Mentor (TU Delft - Multimedia Computing)

Francesco Fioranelli – Graduation committee member (TU Delft - Microwave Sensing, Signals & Systems)

Mario Alberto Coutiño Minguez – Mentor (TNO)

Faculty
Electrical Engineering, Mathematics and Computer Science
More Info
expand_more
Publication Year
2024
Language
English
Graduation Date
19-07-2024
Awarding Institution
Delft University of Technology
Programme
['Computer Science | Multimedia Computing']
Sponsors
TNO
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

This thesis investigates the application of multi-agent reinforcement learning (MARL) to the optimization of radar waveforms. Radar technology is crucial in fields such as aviation, maritime navigation, and defense, but faces challenges such as interference, clutter, and the need for high resolution and accuracy. Cognitive radar, which adapts to environmental changes in real-time, offers a promising solution. This research aims to explore the potential of MARL in optimizing radar waveforms and examines whether incorporating domain knowledge can enhance performance.

The radar waveform optimization problem is framed within the Decentralized Partially Observable Markov Decision Process (Dec-POMDP) framework, defining the radar environment, agents' observations and actions, and reward functions. The study experiments with different architectures, including decentralized actors with a centralized critic. The centralized critic, having access to global state information, helps stabilize the learning process and mitigate non-stationarity and credit assignment problems. The use of GNNs as a centralized critic is proposed to leverage graph data sparsity, enhancing scalability.

The proposed models are trained and tested in a radar-tracking scenario, evaluated in terms of Pareto optimality and optimization times. The results show that both Independent Actor-Critic (IAC) and Independent Actor with Centralized Critic (IACC) models outperform traditional methods in terms of probability of detection, waveform duration, and optimization speed. The findings highlight the effectiveness of MARL approaches in optimizing radar waveforms, emphasizing the benefits of centralized critics for robustness and coordination. However, the choice of architecture significantly impacts performance, and while GNNs offer potential scalability advantages, their integration of domain knowledge did not yield significant improvements in this study. This research lays a foundation for future exploration of MARL and GNNs in radar waveform optimization.

Files

License info not available