Decentralized Reinforcement Learning of robot behaviors

None, None; None, None; None, None

Decentralized Reinforcement Learning of robot behaviors

Journal Article (2018)

Author(s)

David L. Leottau (Universidad de Santiago de Chile)

Javier Ruiz-del-Solar (Universidad de Santiago de Chile)

R. Babuska (TU Delft - Learning & Autonomous Control, Czech Technical University)

Research Group

Learning & Autonomous Control

DOI related publication

https://doi.org/10.1016/j.artint.2017.12.001

Multi-agent systems Reinforcement learning Decentralized control Autonomous robots Distributed artificial intelligence

To reference this document use:

https://resolver.tudelft.nl/uuid:ca8f4bdd-643f-4d3f-83af-52195921fec6

More Info

expand_more

Publication Year

2018

Language

English

Research Group

Learning & Autonomous Control

Volume number

256

Pages (from-to)

130-159

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

A multi-agent methodology is proposed for Decentralized Reinforcement Learning (DRL) of individual behaviors in problems where multi-dimensional action spaces are involved. When using this methodology, sub-tasks are learned in parallel by individual agents working toward a common goal. In addition to proposing this methodology, three specific multi agent DRL approaches are considered: DRL-Independent, DRL Cooperative-Adaptive (CA), and DRL-Lenient. These approaches are validated and analyzed with an extensive empirical study using four different problems: 3D Mountain Car, SCARA Real-Time Trajectory Generation, Ball-Dribbling in humanoid soccer robotics, and Ball-Pushing using differential drive robots. The experimental validation provides evidence that DRL implementations show better performances and faster learning times than their centralized counterparts, while using less computational resources. DRL-Lenient and DRL-CA algorithms achieve the best final performances for the four tested problems, outperforming their DRL-Independent counterparts. Furthermore, the benefits of the DRL-Lenient and DRL-CA are more noticeable when the problem complexity increases and the centralized scheme becomes intractable given the available computational resources and training time.

Files

MAS_DRL.pdf

(pdf | 6.87 Mb)

- Embargo expired in 22-12-2019