Alternating Maximization with Behavioral Cloning

None, None; None, None

Alternating Maximization with Behavioral Cloning

Conference Paper (2020)

Author(s)

Aleksander Czechowski (TU Delft - Interactive Intelligence)

F.A. Oliehoek (TU Delft - Interactive Intelligence)

Research Group

Interactive Intelligence

Copyright

To reference this document use:

https://resolver.tudelft.nl/uuid:8b2e3368-bfbd-495e-9f00-c7419d75e60b

More Info

expand_more

Publication Year

2020

Language

English

Copyright

Research Group

Interactive Intelligence

Pages (from-to)

370-371

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

The key difficulty of cooperative, decentralized planning lies in making accurate predictions about the behavior of one’s teammates. In this paper we introduce a planning method of Alternating maximization with Behavioural Cloning (ABC) – a trainable online decentralized planning algorithm based on Monte Carlo Tree Search (MCTS), combined with models of teammates learned from previous episodic runs. Our algorithm relies on the idea of alternating maximization, where agents adapt their models one at a time in round-robin manner. Under the assumption of perfect policy cloning, and with a sufficient amount of Monte Carlo samples, successive iterations of our method are guaranteed to improve joint policies, and eventually converge.

Files

Bnaic2020proceedings02.pdf

(pdf | 0.622 Mb)

License info not available