Learning state representation for deep actor-critic control

Conference paper (2016)

Authors

J. Munk

J. Kober OLD Intelligent Control & Robotics

R. Babuska OLD Intelligent Control & Robotics

Research Group

OLD Intelligent Control & Robotics

DOI: https://doi.org/10.1109/CDC.2016.7798980

Feature extraction Approximation algorithms Robot sensing systems Learning (artificial intelligence) Algorithm design and analysis Prediction algorithms

More Info

expand_more

To reference this document use:

http://resolver.tudelft.nl/uuid:1830de68-f008-471f-898f-0665c2a907d2

Published Date

2016

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Research Group

OLD Intelligent Control & Robotics

Abstract

Deep Neural Networks (DNNs) can be used as function approximators in Reinforcement Learning (RL). One advantage of DNNs is that they can cope with large input dimensions. Instead of relying on feature engineering to lower the input dimension, DNNs can extract the features from raw observations. The drawback of this end-to-end learning is that it usually requires a large amount of data, which for real-world control applications is not always available. In this paper, a new algorithm, Model Learning Deep Deterministic Policy Gradient (ML-DDPG), is proposed that combines RL with state representation learning, i.e., learning a mapping from an input vector to a state before solving the RL task. The ML-DDPG algorithm uses a concept we call predictive priors to learn a model network which is subsequently used to pre-train the first layer of the actor and critic networks. Simulation results show that the ML-DDPG can learn reasonable continuous control policies from high-dimensional observations that contain also task-irrelevant information. Furthermore, in some cases, this approach significantly improves the final performance in comparison to end-to-end learning.

Files

Jelle_Munk_CDC2016_author_vers... (.pdf)

(.pdf | 0.455 Mb)