Alternating Maximization with Behavioral Cloning
Aleksander Czechowski (TU Delft - Interactive Intelligence)
F.A. Oliehoek (TU Delft - Interactive Intelligence)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
The key difficulty of cooperative, decentralized planning lies in making accurate predictions about the behavior of one’s teammates. In this paper we introduce a planning method of Alternating maximization with Behavioural Cloning (ABC) – a trainable online decentralized planning algorithm based on Monte Carlo Tree Search (MCTS), combined with models of teammates learned from previous episodic runs. Our algorithm relies on the idea of alternating maximization, where agents adapt their models one at a time in round-robin manner. Under the assumption of perfect policy cloning, and with a sufficient amount of Monte Carlo samples, successive iterations of our method are guaranteed to improve joint policies, and eventually converge.