Efficient Communication in Robust Multi-agent Reinforcement Learning

None, None

Efficient Communication in Robust Multi-agent Reinforcement Learning

Trading Observational Robustness for Fewer Communications

Master Thesis (2023)

Author(s)

J. de Gooijer (TU Delft - Mechanical Engineering)

Contributor(s)

M Mazo Espinosa – Mentor (TU Delft - Team Manuel Mazo Jr)

Javier Alonso-Mora – Graduation committee member (TU Delft - Learning & Autonomous Control)

Daniel Ornia – Coach (TU Delft - Learning & Autonomous Control)

Faculty

Mechanical Engineering

Copyright

Reinforcement Learning Multi-agent Communication Observational Robustness

To reference this document use:

https://resolver.tudelft.nl/uuid:157ea3e2-59a6-47fd-948d-328daa2bfca3

More Info

expand_more

Publication Year

2023

Language

English

Copyright

Graduation Date

28-08-2023

Awarding Institution

Delft University of Technology

Programme

['Mechanical Engineering | Vehicle Engineering | Cognitive Robotics', 'Mechanical Engineering | Systems and Control']

Abstract

Reinforcement learning, especially deep reinforcement learning, has made many advances in the last decade. Similarly, great strides have been made in multi-agent reinforcement learning. Systems of cooperative autonomous robots are increasingly being used, for which multi-agent reinforcement learning can be used as a training method. However, the curse of dimensionality remains a problem for the computational speed of the learning algorithm and the bandwidth of communication channels. This research will focus mainly on reducing the problem of overloading communication channels by trying to reduce the number of communications. This is possible since it is usually unnecessary for every agent to communicate with every other agent constantly.

To do this, we use some ideas by Daniel Jarne Ornia. The first is to reduce communications in a multi-agent reinforcement learning system by treating it as an event-triggered control problem. This method uses so-called robustness surrogates as an equivalent to a Lyapunov function to determine if a communication can be skipped without decreasing the performance more than some tolerance. The second is a method to increase the observational robustness of a policy by using lexicographic reinforcement learning.

We aim to combine these ideas and trade the additional observational robustness for decreased communications. We also want to test whether additional observational robustness can help mitigate the sim-to-real gap. We implement this method for the multi-agent deep deterministic policy gradient algorithm and perform tests on a variant of the predator-prey domain in increasingly more realistic simulations.

We found that the combination of this robust policy and the robustness surrogates method does enable the agents to achieve the same return while communicating less. Unfortunately, our research shows that the observational robustness obtained using lexicographic reinforcement learning, does not help mitigate the sim-to-real gap.

Files

Thesis_Jessica_de_Gooijer.pdf

(pdf | 3.19 Mb)

License info not available