Alternating Maximization with Behavioral Cloning

Conference Paper (2020)
Author(s)

Aleksander Czechowski (TU Delft - Interactive Intelligence)

F.A. Oliehoek (TU Delft - Interactive Intelligence)

Research Group
Interactive Intelligence
Copyright
© 2020 A.T. Czechowski, F.A. Oliehoek
More Info
expand_more
Publication Year
2020
Language
English
Copyright
© 2020 A.T. Czechowski, F.A. Oliehoek
Research Group
Interactive Intelligence
Pages (from-to)
370-371
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

The key difficulty of cooperative, decentralized planning lies in making accurate predictions about the behavior of one’s teammates. In this paper we introduce a planning method of Alternating maximization with Behavioural Cloning (ABC) – a trainable online decentralized planning algorithm based on Monte Carlo Tree Search (MCTS), combined with models of teammates learned from previous episodic runs. Our algorithm relies on the idea of alternating maximization, where agents adapt their models one at a time in round-robin manner. Under the assumption of perfect policy cloning, and with a sufficient amount of Monte Carlo samples, successive iterations of our method are guaranteed to improve joint policies, and eventually converge.

Files

Bnaic2020proceedings02.pdf
(pdf | 0.622 Mb)
License info not available