Multi-agent reinforcement learning via distributed MPC as a function approximator
S.H. Mallick (TU Delft - Team Bart De Schutter)
Filippo Airaldi (TU Delft - Team Azita Dabiri)
Azita Dabiri (TU Delft - Team Azita Dabiri)
B. de Schutter (TU Delft - Delft Center for Systems and Control)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
This paper presents a novel approach to multi-agent reinforcement learning (RL) for linear systems with convex polytopic constraints. Existing work on RL has demonstrated the use of model predictive control (MPC) as a function approximator for the policy and value functions. The current paper is the first work to extend this idea to the multi-agent setting. We propose the use of a distributed MPC scheme as a function approximator, with a structure allowing for distributed learning and deployment. We then show that Q-learning updates can be performed distributively without introducing nonstationarity, by reconstructing a centralized learning update. The effectiveness of the approach is demonstrated on a numerical example.