Multi-Agent Reinforcement Learning using Centralized Critics in Collaborative Environments

Bachelor thesis (2022)

Authors

A.I. Mija Electrical Engineering, Mathematics and Computer Science

Contributors

R.T. Loftin Interactive Intelligence - (mentor)

F.A. Oliehoek Interactive Intelligence - (mentor)

S.E. Verwer Cyber Security - (graduation committee member)

Faculty

Electrical Engineering, Mathematics and Computer Science, Electrical Engineering, Mathematics and Computer Science

Reinforcement Learning Centralized Critics Multi-Agent Environment

To reference this document use:

http://resolver.tudelft.nl/uuid:e801ead9-60be-44b4-85ee-829c93322763

More Info

expand_more

Published Date

27-06-2022

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Electrical Engineering, Mathematics and Computer Science

Abstract

Agents trained through single-agent reinforcement learning methods such as self-play can provide a good level of performance in multi-agent settings and even in fully cooperative environments. However, most of the time, training multiple agents together using single-agent self-play yields poor results as each agent tries to learn how to perform their task while their teammates are also learning. Thus, training models to reach an optimal behaviour in such situations becomes a challenging, if not impossible issue to overcome. One possible solution to deal with this problem is to facilitate a centralized training process in which the policies of all agents are evaluated by a centralized critic that has access to the observations and actions of all the agents in the environment. By using this approach, the environment becomes stationary and the agents learn in a similar way to using a single-agent algorithm in settings where only one agent needs to be trained. In this paper, we test whether by using a multi-agent reinforcement learning algorithm with centralized critics, as opposed to single-agent ones, we would obtain an agent that generalizes better to new partners in a collaborative environment such as Overcooked, where coordination is critical for good performance. The results display a similar performance between the two algorithms when evaluated through self-play and slightly better or worse results when paired with the human model, representing a mediocre agent, depending on the map. Thus, the multi-agent, centralized critics algorithm used in this study did not train agents that generalize better to new partners. However, the training metrics clearly indicate that the centralized critics method makes the agents learn and converge twice as fast as its single-agent version.

Files

Final_Paper_A_Mija.pdf

(pdf | 0.592 Mb)

Unknown license