Constraint Propagation and Reverse Multi-Agent Learning

More Info
expand_more

Abstract

The development of multi-agent reinforcement learning has been largely driven by the question of how to design learning algorithms to reach some particular notion of optimality of strategies, e.g. Nash equilibria. The set of optimal strategies is not known before the execution of the learning algorithm,
however we can often immediately identify a set of clearly undesirable outcomes. Therefore, we propose to consider a dual problem: given a collection of agent algorithms and a collection of unwanted strategy profiles, can one identify a set
of starting strategies that invariably lead there? This leads us to study the algorithmic problem of backpropagation of con-straints defining the forbidden region by learning dynamics,

through the lens of set-valued maps and interval arithmetics.