Policy derivation methods for critic-only reinforcement learning in continuous action spaces

Conference Paper (2016)
Author(s)

Eduard Alibekov (Czech Technical University)

Jiřì Kubalìk (Czech Technical University)

R Babuška (TU Delft - OLD Intelligent Control & Robotics, Czech Technical University)

Research Group
OLD Intelligent Control & Robotics
DOI related publication
https://doi.org/10.1016/j.ifacol.2016.07.127
More Info
expand_more
Publication Year
2016
Language
English
Research Group
OLD Intelligent Control & Robotics
Volume number
49 - 5
Pages (from-to)
285-290

Abstract

State-of-the-art critic-only reinforcement learning methods can deal with a small discrete action space. The most common approach to real-world problems with continuous actions is to discretize the action space. In this paper a method is proposed to derive a continuous-action policy based on a value function that has been computed for discrete actions by using any known algorithm such as value iteration. Several variants of the policy-derivation algorithm are introduced and compared on two continuous state-action benchmarks: double pendulum swing-up and 3D mountain car.

No files available

Metadata only record. There are no files for this record.