Probabilistic recursive reasoning for multi-agent reinforcement learning

None, None; None, None; None, None; None, None; None, None

Probabilistic recursive reasoning for multi-agent reinforcement learning

Poster (2019)

Author(s)

Ying Wen (University College London)

Yaodong Yang (University College London)

Rui Luo (University College London)

Jun Wang (University College London)

Wei Pan (TU Delft - Mechanical Engineering)

Research Group

Robust Robot Systems

To reference this document use

https://resolver.tudelft.nl/uuid:3aeb90a8-e115-4074-b62c-6d8d664f7178

More Info

expand_more

Publication Year

2019

Language

English

Research Group

Robust Robot Systems

Event

7th International Conference on Learning Representations, ICLR 2019 (2019-05-06 - 2019-05-09), New Orleans, United States

Downloads counter

329

Collections

Institutional Repository

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Humans are capable of attributing latent mental contents such as beliefs, or intentions to others. The social skill is critical in everyday life to reason about the potential consequences of their behaviors so as to plan ahead. It is known that humans use this reasoning ability recursively, i.e. considering what others believe about their own beliefs. In this paper, we start from level-1 recursion and introduce a probabilistic recursive reasoning (PR2) framework for multi-agent reinforcement learning. Our hypothesis is that it is beneficial for each agent to account for how the opponents would react to its future behaviors. Under the PR2 framework, we adopt variational Bayes methods to approximate the opponents' conditional policy, to which each agent finds the best response and then improve their own policy. We develop decentralized-training-decentralized-execution algorithms, PR2-Q and PR2-Actor-Critic, that are proved to converge in the self-play scenario when there is one Nash equilibrium. Our methods are tested on both the matrix game and the differential game, which have a non-trivial equilibrium where common gradient-based methods fail to converge. Our experiments show that it is critical to reason about how the opponents believe about what the agent believes. We expect our work to contribute a new idea of modeling the opponents to the multi-agent reinforcement learning community.

Files

C95f37cb_f64e_437d_95fc_95da0a... (pdf)

(pdf | 2.46 Mb)

License info not available