Distributed Actor-Critic Algorithms for Multiagent Reinforcement Learning Over Directed Graphs

None, None; None, None; None, None; None, None

Distributed Actor-Critic Algorithms for Multiagent Reinforcement Learning Over Directed Graphs

Journal Article (2023)

Author(s)

Pengcheng Dai (Southeast University)

Wenwu Yu (Southeast University)

He Wang (Southeast University)

Simone Baldi (TU Delft - Team Bart De Schutter, Southeast University)

Research Group

Team Bart De Schutter

DOI related publication

https://doi.org/10.1109/TNNLS.2021.3139138

Topology Convergence Directed graph Approximation algorithms Protocols Q-learning Directed graphs Distributed actor-critic (AC) algorithm Function approximation Multiagent reinforcement learning (MARL) Push-sum protocol.

To reference this document use:

https://resolver.tudelft.nl/uuid:14b1af75-bf0f-42d6-b2cb-33232afacd4b

More Info

expand_more

Publication Year

2023

Language

English

Research Group

Team Bart De Schutter

Issue number

10

Volume number

34

Pages (from-to)

7210-7221

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Actor-critic (AC) cooperative multiagent reinforcement learning (MARL) over directed graphs is studied in this article. The goal of the agents in MARL is to maximize the globally averaged return in a distributed way, i.e., each agent can only exchange information with its neighboring agents. AC methods proposed in the literature require the communication graphs to be undirected and the weight matrices to be doubly stochastic (more precisely, the weight matrices are row stochastic and their expectation are column stochastic). Differently from these methods, we propose a distributed AC algorithm for MARL over directed graph with fixed topology that only requires the weight matrix to be row stochastic. Then, we also study the MARL over directed graphs (possibly not connected) with changing topologies, proposing a different distributed AC algorithm based on the push-sum protocol that only requires the weight matrices to be column stochastic. Convergence of the proposed algorithms is proven for linear function approximation of the action value function. Simulations are presented to demonstrate the effectiveness of the proposed algorithms.

Files

Distributed_ActorCritic_Algori... (pdf)

(pdf | 1.74 Mb)

- Embargo expired in 11-07-2022

License info not available