Policy derivation methods for critic-only reinforcement learning in continuous spaces

Journal Article (2018)
Author(s)

Eduard Alibekov (Czech Technical University)

Jiri Kubalik (Czech Technical University)

Robert Babuska (Czech Technical University, TU Delft - Learning & Autonomous Control)

DOI related publication
https://doi.org/10.1016/j.engappai.2017.12.004 Final published version
More Info
expand_more
Publication Year
2018
Language
English
Volume number
69
Pages (from-to)
178-187
Downloads counter
276
Collections
Institutional Repository
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

This paper addresses the problem of deriving a policy from the value function in the context of critic-only reinforcement learning (RL) in continuous state and action spaces. With continuous-valued states, RL algorithms have to rely on a numerical approximator to represent the value function. Numerical approximation due to its nature virtually always exhibits artifacts which damage the overall performance of the controlled system. In addition, when continuous-valued action is used, the most common approach is to discretize the action space and exhaustively search for the action that maximizes the right-hand side of the Bellman equation. Such a policy derivation procedure is computationally involved and results in steady-state error due to the lack of continuity. In this work, we propose policy derivation methods which alleviate the above problems by means of action space refinement, continuous approximation, and post-processing of the V-function by using symbolic regression. The proposed methods are tested on nonlinear control problems: 1-DOF and 2-DOF pendulum swing-up problems, and on magnetic manipulation. The results show significantly improved performance in terms of cumulative return and computational complexity.

Files

Root_R2.pdf
(pdf | 0.904 Mb)
- Embargo expired in 05-02-2020