Policy derivation methods for critic-only reinforcement learning in continuous spaces

None, None; None, None; None, None

Policy derivation methods for critic-only reinforcement learning in continuous spaces

Journal Article (2018)

Author(s)

Eduard Alibekov (Czech Technical University)

Jiri Kubalik (Czech Technical University)

Robert Babuska (Czech Technical University, TU Delft - Learning & Autonomous Control)

Optimization Reinforcement learning Optimal control Continuous actions Multi-variable systems Policy derivation

DOI related publication

https://doi.org/10.1016/j.engappai.2017.12.004 Final published version

To reference this document use

https://resolver.tudelft.nl/uuid:b79399a4-b131-4a74-bac0-7e96930c9b1b

More Info

expand_more

Publication Year

2018

Language

English

Volume number

69

Pages (from-to)

178-187

Downloads counter

276

Collections

Institutional Repository

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

This paper addresses the problem of deriving a policy from the value function in the context of critic-only reinforcement learning (RL) in continuous state and action spaces. With continuous-valued states, RL algorithms have to rely on a numerical approximator to represent the value function. Numerical approximation due to its nature virtually always exhibits artifacts which damage the overall performance of the controlled system. In addition, when continuous-valued action is used, the most common approach is to discretize the action space and exhaustively search for the action that maximizes the right-hand side of the Bellman equation. Such a policy derivation procedure is computationally involved and results in steady-state error due to the lack of continuity. In this work, we propose policy derivation methods which alleviate the above problems by means of action space refinement, continuous approximation, and post-processing of the V-function by using symbolic regression. The proposed methods are tested on nonlinear control problems: 1-DOF and 2-DOF pendulum swing-up problems, and on magnetic manipulation. The results show significantly improved performance in terms of cumulative return and computational complexity.

Files

Root_R2.pdf

(pdf | 0.904 Mb)

- Embargo expired in 05-02-2020