JW
J.D. Willemsen
info
Please Note
<p>This page displays the records of the person named above and is not linked to a unique person identifier. This record may need to be merged to a profile.</p>
2 records found
1
Multi-agent robotic systems could benefit from reinforcement learning algorithms that are able to learn behaviours in a small number trials, a property known as sample efficiency. This research investigates the use of learned world models to create more sample-efficient algorithms. We present a novel multi-agent model-based reinforcement learning algorithm: Multi-Agent Model-Based Policy Optimization (MAMBPO), utilizing the Centralized Learning for Decentralized Execution (CLDE) framework, and demonstrate state-of-the-art performance in terms of sample efficiency on a number of benchmark domains. CLDE algorithms allow a group of agents to act in a fully decentralized manner after training. This is a desirable property for many systems comprising of multiple robots. Current CLDE algorithms such as Multi-Agent Soft Actor-Critic (MASAC) suffer from limited sample efficiency, often taking many thousands of trials before learning desirable behaviours. This makes these algorithms impractical for learning in real-world robotic tasks. MAMBPO utilizes a learned world model to improve sample efficiency compared to its model-free counterparts. We demonstrate on two simulated multi-agent robotics tasks that MAMBPO is able to reach similar performance to MASAC with up to 3.7 times fewer samples required for learning. Doing this, we take an important step towards making real-life learning for multi-agent robotic systems possible.
...
Multi-agent robotic systems could benefit from reinforcement learning algorithms that are able to learn behaviours in a small number trials, a property known as sample efficiency. This research investigates the use of learned world models to create more sample-efficient algorithms. We present a novel multi-agent model-based reinforcement learning algorithm: Multi-Agent Model-Based Policy Optimization (MAMBPO), utilizing the Centralized Learning for Decentralized Execution (CLDE) framework, and demonstrate state-of-the-art performance in terms of sample efficiency on a number of benchmark domains. CLDE algorithms allow a group of agents to act in a fully decentralized manner after training. This is a desirable property for many systems comprising of multiple robots. Current CLDE algorithms such as Multi-Agent Soft Actor-Critic (MASAC) suffer from limited sample efficiency, often taking many thousands of trials before learning desirable behaviours. This makes these algorithms impractical for learning in real-world robotic tasks. MAMBPO utilizes a learned world model to improve sample efficiency compared to its model-free counterparts. We demonstrate on two simulated multi-agent robotics tasks that MAMBPO is able to reach similar performance to MASAC with up to 3.7 times fewer samples required for learning. Doing this, we take an important step towards making real-life learning for multi-agent robotic systems possible.
Bachelor thesis
(2017)
-
M.B. Büller, A.P.K. Claes, J.D. Willemsen, M.J. Drenth, M.A. Griffioen, M.A. Griffioen, D.J. Groot, J.T. Janowski, C. Marchena Esteban, P.J.A. Post, R. Wedemeijer, R. Vos, B.C.P. Jongbloed