Symbolic method for deriving policy in reinforcement learning

Conference Paper (2016)
Author(s)

Eduard Alibekov (Czech Technical University)

Jiřì Kubalìk (Czech Technical University)

R Babuška (Czech Technical University, TU Delft - OLD Intelligent Control & Robotics)

Research Group
OLD Intelligent Control & Robotics
Copyright
© 2016 Eduard Alibekov, Jiřì Kubalìk, R. Babuska
DOI related publication
https://doi.org/10.1109/CDC.2016.7798684
More Info
expand_more
Publication Year
2016
Language
English
Copyright
© 2016 Eduard Alibekov, Jiřì Kubalìk, R. Babuska
Research Group
OLD Intelligent Control & Robotics
Pages (from-to)
2789-2795
ISBN (print)
978-1-5090-1837-6
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

This paper addresses the problem of deriving a policy from the value function in the context of reinforcement learning in continuous state and input spaces. We propose a novel method based on genetic programming to construct a symbolic function, which serves as a proxy to the value function and from which a continuous policy is derived. The symbolic proxy function is constructed such that it maximizes the number of correct choices of the control input for a set of selected states. Maximization methods can then be used to derive a control policy that performs better than the policy derived from the original approximate value function. The method was experimentally evaluated on two control problems with continuous spaces, pendulum swing-up and magnetic manipulation, and compared to a standard policy derivation method using the value function approximation. The results show that the proposed method and its variants outperform the standard method.

Files

License info not available