Print Email Facebook Twitter Efficient Communication in Robust Multi-agent Reinforcement Learning Title Efficient Communication in Robust Multi-agent Reinforcement Learning: Trading Observational Robustness for Fewer Communications Author de Gooijer, Jessica (TU Delft Mechanical, Maritime and Materials Engineering) Contributor Mazo, M. (mentor) Alonso Mora, J. (graduation committee) Jarne Ornia, D. (graduation committee) Degree granting institution Delft University of Technology Corporate name Delft University of Technology Programme Mechanical Engineering | Systems and Control Date 2023-08-28 Abstract Reinforcement learning, especially deep reinforcement learning, has made many advances in the last decade. Similarly, great strides have been made in multi-agent reinforcement learning. Systems of cooperative autonomous robots are increasingly being used, for which multi-agent reinforcement learning can be used as a training method. However, the curse of dimensionality remains a problem for the computational speed of the learning algorithm and the bandwidth of communication channels. This research will focus mainly on reducing the problem of overloading communication channels by trying to reduce the number of communications. This is possible since it is usually unnecessary for every agent to communicate with every other agent constantly. To do this, we use some ideas by Daniel Jarne Ornia. The first is to reduce communications in a multi-agent reinforcement learning system by treating it as an event-triggered control problem. This method uses so-called robustness surrogates as an equivalent to a Lyapunov function to determine if a communication can be skipped without decreasing the performance more than some tolerance. The second is a method to increase the observational robustness of a policy by using lexicographic reinforcement learning. We aim to combine these ideas and trade the additional observational robustness for decreased communications. We also want to test whether additional observational robustness can help mitigate the sim-to-real gap. We implement this method for the multi-agent deep deterministic policy gradient algorithm and perform tests on a variant of the predator-prey domain in increasingly more realistic simulations. We found that the combination of this robust policy and the robustness surrogates method does enable the agents to achieve the same return while communicating less. Unfortunately, our research shows that the observational robustness obtained using lexicographic reinforcement learning, does not help mitigate the sim-to-real gap. Subject Reinforcement LearningObservational RobustnessCommunicationMulti-agent To reference this document use: http://resolver.tudelft.nl/uuid:157ea3e2-59a6-47fd-948d-328daa2bfca3 Bibliographical note Double degree in Systems and Control and Robotics at Delft University of Technology https://github.com/J-deGooijer/Efficient-Communication-in-Robust-Multi-agent-Reinforcement-Learning GitHub repository Part of collection Student theses Document type master thesis Rights © 2023 Jessica de Gooijer Files PDF Thesis_Jessica_de_Gooijer.pdf 3.19 MB Close viewer /islandora/object/uuid:157ea3e2-59a6-47fd-948d-328daa2bfca3/datastream/OBJ/view