Multi-agent reinforcement learning via distributed MPC as a function approximator

None, None; None, None; None, None; None, None

Multi-agent reinforcement learning via distributed MPC as a function approximator

Journal Article (2024)

Author(s)

Samuel Mallick (TU Delft - Team Bart De Schutter)

Filippo Airaldi (TU Delft - Team Azita Dabiri)

Azita Dabiri (TU Delft - Team Azita Dabiri)

B. De Schutter (TU Delft - Delft Center for Systems and Control)

Research Group

Team Azita Dabiri

DOI related publication

https://doi.org/10.1016/j.automatica.2024.111803

Multi-agent reinforcement learning Networked systems ADMM Distributed model predictive control

To reference this document use:

https://resolver.tudelft.nl/uuid:f4c0dd71-a572-4ab8-95d5-c9ec0f033647

More Info

expand_more

Publication Year

2024

Language

English

Research Group

Team Azita Dabiri

Bibliographical Note

Green Open Access added to TU Delft Institutional Repository 'You share, we take care!' - Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.@en

Volume number

167

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

This paper presents a novel approach to multi-agent reinforcement learning (RL) for linear systems with convex polytopic constraints. Existing work on RL has demonstrated the use of model predictive control (MPC) as a function approximator for the policy and value functions. The current paper is the first work to extend this idea to the multi-agent setting. We propose the use of a distributed MPC scheme as a function approximator, with a structure allowing for distributed learning and deployment. We then show that Q-learning updates can be performed distributively without introducing nonstationarity, by reconstructing a centralized learning update. The effectiveness of the approach is demonstrated on a numerical example.

Files

1-s2.0-S0005109824002978-main.... (pdf)

(pdf | 1 Mb)

- Embargo expired in 22-12-2024

License info not available