Consolidated Deep Actor Critic Networks

Van der Laan, T.A.

Consolidated Deep Actor Critic Networks

Master thesis (2015)

Authors

T.A. Van der Laan

Contributors

M. Loog (mentor)

J. Kober (mentor)

Programme

Computer Science

Deep learning Reinforcement learning Actor critic model Experience replay

To reference this document use:

http://resolver.tudelft.nl/uuid:682a56ed-8e21-4b70-af11-0e8e9e298fa2

More Info

expand_more

Published Date

30-09-2015

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Programme

Computer Science

Abstract

The works [Volodymyr et al. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602, 2013.] and [Volodymyr et al. Human-level control through deep reinforcement learning. Nature, 518(7540):529–533, 2015.] have demonstrated the power of combining deep neural networks with Watkins Q learning. They introduce deep Q networks (DQN) that learn to associate High dimensional inputs with Q values in order to produce discrete actions, allowing the system to learn complex strategies and play Atari games such as Breakout and Space invaders. Although powerful the system is limited to discrete actions. If we wish to control more complex systems like robots we need the ability to output multidimensional continuous actions. In this paper we investigate how to combine deep neural networks with actor critic models which have the ability to output multidimensional continuous actions. We name this class of systems deep actor critic networks (DACN) following the DQN naming convention. We derive and experiment with four methods to update the actor. We then consolidate the actor and critic networks into one unified network which we name consolidated deep actor critic networks (C-DACN). We hypothesize that consolidating the actor and critic networks might lead to faster convergence. We test the system in two environments named Acrobot (under actuated double pendulum) and Bounce (continuous action Atari Breakout look alike).

Files

Master_Thesis_-_Consolidated_D... (pdf)

(pdf | 1.57 Mb)

License info not available