Multi-Agent Reinforcement Learning using Centralized Critics in Collaborative Environments
More Info
expand_more
Abstract
Agents trained through single-agent reinforcement learning methods such as self-play can provide a good level of performance in multi-agent settings and even in fully cooperative environments. However, most of the time, training multiple agents together using single-agent self-play yields poor results as each agent tries to learn how to perform their task while their teammates are also learning. Thus, training models to reach an optimal behaviour in such situations becomes a challenging, if not impossible issue to overcome. One possible solution to deal with this problem is to facilitate a centralized training process in which the policies of all agents are evaluated by a centralized critic that has access to the observations and actions of all the agents in the environment. By using this approach, the environment becomes stationary and the agents learn in a similar way to using a single-agent algorithm in settings where only one agent needs to be trained. In this paper, we test whether by using a multi-agent reinforcement learning algorithm with centralized critics, as opposed to single-agent ones, we would obtain an agent that generalizes better to new partners in a collaborative environment such as Overcooked, where coordination is critical for good performance. The results display a similar performance between the two algorithms when evaluated through self-play and slightly better or worse results when paired with the human model, representing a mediocre agent, depending on the map. Thus, the multi-agent, centralized critics algorithm used in this study did not train agents that generalize better to new partners. However, the training metrics clearly indicate that the centralized critics method makes the agents learn and converge twice as fast as its single-agent version.