Symbolic method for deriving policy in reinforcement learning

Conference Paper (2016)
Author(s)

Eduard Alibekov (Czech Technical University)

Jiřì Kubalìk (Czech Technical University)

Robert Babuska (Czech Technical University, TU Delft - OLD Intelligent Control & Robotics)

Research Group
OLD Intelligent Control & Robotics
DOI related publication
https://doi.org/10.1109/CDC.2016.7798684
More Info
expand_more
Publication Year
2016
Language
English
Research Group
OLD Intelligent Control & Robotics
Pages (from-to)
2789-2795
ISBN (print)
978-1-5090-1837-6
Event
55th IEEE Conference on Decision and Control, CDC 2016 (2016-12-12 - 2016-12-14), Las Vegas, United States
Downloads counter
276
Collections
Institutional Repository
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

This paper addresses the problem of deriving a policy from the value function in the context of reinforcement learning in continuous state and input spaces. We propose a novel method based on genetic programming to construct a symbolic function, which serves as a proxy to the value function and from which a continuous policy is derived. The symbolic proxy function is constructed such that it maximizes the number of correct choices of the control input for a set of selected states. Maximization methods can then be used to derive a control policy that performs better than the policy derived from the original approximate value function. The method was experimentally evaluated on two control problems with continuous spaces, pendulum swing-up and magnetic manipulation, and compared to a standard policy derivation method using the value function approximation. The results show that the proposed method and its variants outperform the standard method.

Files

License info not available