Policy derivation methods for critic-only reinforcement learning in continuous action spaces

Alibekov, Eduard; Kubalìk, Jiri; Babuška, Robert

Policy derivation methods for critic-only reinforcement learning in continuous action spaces

Conference paper (2016)

Authors

Eduard Alibekov Czech Technical University

Jiri Kubalìk Czech Technical University

Robert Babuška Czech Technical University, OLD Intelligent Control & Robotics

Research Group

OLD Intelligent Control & Robotics

Reinforcement learning Optimal control Policy derivation Continuous actions Multi-variable systems

To reference this document use:

http://resolver.tudelft.nl/uuid:37bdd34d-4ecd-44ae-aa48-18d16ab4b4c5

More Info

expand_more

Published Date

2016

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Research Group

OLD Intelligent Control & Robotics

Abstract

State-of-the-art critic-only reinforcement learning methods can deal with a small discrete action space. The most common approach to real-world problems with continuous actions is to discretize the action space. In this paper a method is proposed to derive a continuous-action policy based on a value function that has been computed for discrete actions by using any known algorithm such as value iteration. Several variants of the policy-derivation algorithm are introduced and compared on two continuous state-action benchmarks: double pendulum swing-up and 3D mountain car.

No files available

Metadata only record. There are no files for this conference paper.