Efficient Communication in Robust Multi-agent Reinforcement Learning

Trading Observational Robustness for Fewer Communications

Master thesis (2023)

Authors

Contributors

M. Mazo Team Manuel Mazo Jr - Mechanical, Maritime and Materials Engineering (mentor)

Javier Alonso-Mora Learning & Autonomous Control - Mechanical, Maritime and Materials Engineering (graduation committee member)

J. Alonso-Mora Learning & Autonomous Control - Mechanical, Maritime and Materials Engineering (graduation committee member)

D. Jarne Ornia Learning & Autonomous Control - Mechanical, Maritime and Materials Engineering (coach)

Faculty

Mechanical Engineering, Mechanical Engineering

To reference this document use:

http://resolver.tudelft.nl/uuid:157ea3e2-59a6-47fd-948d-328daa2bfca3

More Info

expand_more

Published Date

28-08-2023

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Mechanical Engineering

Abstract

Reinforcement learning, especially deep reinforcement learning, has made many advances in the last decade. Similarly, great strides have been made in multi-agent reinforcement learning. Systems of cooperative autonomous robots are increasingly being used, for which multi-agent reinforcement learning can be used as a training method. However, the curse of dimensionality remains a problem for the computational speed of the learning algorithm and the bandwidth of communication channels. This research will focus mainly on reducing the problem of overloading communication channels by trying to reduce the number of communications. This is possible since it is usually unnecessary for every agent to communicate with every other agent constantly.

To do this, we use some ideas by Daniel Jarne Ornia. The first is to reduce communications in a multi-agent reinforcement learning system by treating it as an event-triggered control problem. This method uses so-called robustness surrogates as an equivalent to a Lyapunov function to determine if a communication can be skipped without decreasing the performance more than some tolerance. The second is a method to increase the observational robustness of a policy by using lexicographic reinforcement learning.

We aim to combine these ideas and trade the additional observational robustness for decreased communications. We also want to test whether additional observational robustness can help mitigate the sim-to-real gap. We implement this method for the multi-agent deep deterministic policy gradient algorithm and perform tests on a variant of the predator-prey domain in increasingly more realistic simulations.

We found that the combination of this robust policy and the robustness surrogates method does enable the agents to achieve the same return while communicating less. Unfortunately, our research shows that the observational robustness obtained using lexicographic reinforcement learning, does not help mitigate the sim-to-real gap.

Files

Thesis_Jessica_de_Gooijer.pdf

(pdf | 3.19 Mb)

Unknown license